22-25 April 2026
SQLBits 2024

Better ETL with Managed Airflow in ADF

Building complex data workflows using Azure Data Factory can get a little clunky, especially as you orchestration needs get more complex. Recently another option become available, Managed Airflow in ADF. Managed Airflow brings Apace Airflow to Azure as a PaaS service. In this session we discover what Airflow is, why we might want to use it for ETL orchestration, and see how it works with lots of demos.
Building complex data workflows using Azure Data Factory can get a little clunky - as you orchestration needs get more complex you hit limitations like not being able to nest loops or conditionals, running simple Python, bash or PowerShell scripts is difficult, and costs can grow quickly as you are charged per task execution. Recently another option become available, Managed Airflow in ADF.

Apace Airflow is a code-centric open-source platform for developing, scheduling and monitoring batch-based data workflows, built using the python language Data Engineers know and love. But until Managed Airflow, getting it working in Azure was a complex task for customers more used to PaaS services such as ADF, Databricks and Fabric. It is also an important ETL orchestrator on AWS and GCP, so cross cloud compatibility becomes simpler to achieve.

In this session we’ll look at what Airflow is, how it’s different from ADF, and what advantages Managed Airflow in ADF gives us. We talk about the idea of a DAG for building the workflow, and then work through some demos to show just how easy it is to use Python to write an Airflow DAG’s and import them into the Managed Airflow Environment as pipelines. We then dive into the excellent monitoring UI and find out just how easy is it to trigger a pipeline, view it to see the dependencies between tasks, and monitor runs.

By the end of the session attendees will have a good understanding of what Airflow is, when to use it, and how it fits into the Azure Data Platform.

Speakers

Niall Langley

niall-langley.me

Niall Langley's previous sessions

Better ETL with Managed Airflow in ADF
Building complex data workflows using Azure Data Factory can get a little clunky, especially as you orchestration needs get more complex. Recently another option become available, Managed Airflow in ADF. Managed Airflow brings Apace Airflow to Azure as a PaaS service. In this session we discover what Airflow is, why we might want to use it for ETL orchestration, and see how it works with lots of demos.
 
Introduction to Databricks Delta Live Tables
Delta Live Tables is a new framework available in Databricks that aims to accelerate building data pipelines by providing out of the box scheduling, dependency resolution, data validation and logging. We'll cover the basics, and then get into the demo's to show how we can: - Setup a notebook to hold our code and queries - Ingest quickly and easily into bronze tables using Auto Loader - Create views and tables on top of the ingested data using SQL and/or python to build our silver and gold layers - Create a pipeline to run the notebook - See how we can run the pipeline as either a batch job, or as a continuous job for low latency updates - Use APPLY CHANGES INTO to upsert changed data into a live table - Apply data validation rules to our live table definition queries, and get detailed logging info on how many records caused problems on each execution. By the end of the session you should have a good view of whether this can help you build our your next data project faster, and make it more reliable.
 
Slowly Changing Dimensions made Easy with Durable Keys
In this session we look at a simple way to implement Kimball durable keys on a SCD2 dimension. This provides an easy, performant, way to support reporting on data using historical and current hierarchies.
 
SQL Server Encryption for the Layman
With GDPR and the number of data breaches we see in the news, encrypting sensitive data is incredibly important. In this talk we start with the basics of encryption, moving on to look at the ways we can encrypt data in SQL Server and Azure SQL DB.