SQLBits 2023
Delta Live Tables: Building Reliable ETL Pipelines with Azure Databricks
In this session, we will see how to use Delta Live Tables to build fast, reliable, scalable, and declarative ETL pipelines on Azure Databricks platform.
In real-life scenarios, ETL pipelines are not straightforward. With growing data sources, data volumes, and changing requirements - building & maintaining ETL pipelines becomes a key challenge for data engineering teams.
But it's not just the data movement that brings complexity. There are several other factors too - like managing infrastructure, incrementally loading data, handling batch & streaming data, dealing with multiple entities, handling data quality issues, tracking their lineage & dependencies, handling failures, etc. This makes building reliable ETL pipelines a daunting task.
This is where "Delta Live Tables" step in. It's a product built by Databricks.
Delta Live Tables (DLT) is a capability built-on Delta Lake. It is a framework to build reliable, automated, testable, and declarative ETL pipelines.
And all this, *without* having to manage dependencies, infrastructure, failures, retries, etc! Sounds interesting?
So in this demo-oriented session, you will see the following:
1. What is Delta architecture and its layers (Bronze, Silver, and Gold layers)
2. What are Delta Live Tables, and their core components
3. How to build an end-to-end ETL pipeline, using DLT components (datasets, DQ checks, queries, pipelines, etc.)
4. How to build complex, multi-entity incremental loading pipelines
5. How to handle batch & streaming data together
Come over... We will be using Azure Databricks to do this activity!
But it's not just the data movement that brings complexity. There are several other factors too - like managing infrastructure, incrementally loading data, handling batch & streaming data, dealing with multiple entities, handling data quality issues, tracking their lineage & dependencies, handling failures, etc. This makes building reliable ETL pipelines a daunting task.
This is where "Delta Live Tables" step in. It's a product built by Databricks.
Delta Live Tables (DLT) is a capability built-on Delta Lake. It is a framework to build reliable, automated, testable, and declarative ETL pipelines.
And all this, *without* having to manage dependencies, infrastructure, failures, retries, etc! Sounds interesting?
So in this demo-oriented session, you will see the following:
1. What is Delta architecture and its layers (Bronze, Silver, and Gold layers)
2. What are Delta Live Tables, and their core components
3. How to build an end-to-end ETL pipeline, using DLT components (datasets, DQ checks, queries, pipelines, etc.)
4. How to build complex, multi-entity incremental loading pipelines
5. How to handle batch & streaming data together
Come over... We will be using Azure Databricks to do this activity!