SQLBits 2020

Simplify your ETL with Databricks Delta Lake

Databricks Delta Lake brings new levels of reliability and transformation abilities to big data solutions. Come and learn how you can simplify your ETL and maximise data consistency and resilience, no matter the size

Using cloud data lakes in big data solutions comes with some baggage. It’s cheap, scalable and convenient, but there is a cost as well – it’s messy and has no transactional or metadata support which is so important when you work with data at scale. Common issues include:

  • Consistency problems with concurrent reading and writing into the data lake.
  • Coping with increased processing times of big data due to lack of indexing optimisations.
  • Spending precious time cleansing the solution if bad quality data disrupts the pipeline.

For these reasons and more, the Apache Spark team created a new Databricks functionality called Delta Lake. As an open-source innovation, Delta Lake brings new capabilities for transactions, version control and indexing to your data lakes. Running on top of existing data lake data, it provides snapshot isolation that tackles the issues of concurrent read and write operations and enables rollback of transactions thorough history tracking of data lake commits. Thanks to its built in optimisation mechanisms, enabling Delta Lake in a data engineering solution can significantly enhance query performance.

In this session we will showcase how Delta Lake works and how easily your modern data engineering pipelines can benefit from its implementation. This workshop would be of interest to anyone that deals with big data or creates modern data warehouse solutions and would like to learn the ways to solve common data lake challenges