22-25 April 2026

Underrated Superpowers of Databricks: A Modern Data Transformation Process

Proposed session for SQLBits 2026

TL; DR

Discover the underrated superpowers of Databricks. Discover how Auto Loader, Spark Declarative Pipelines, Structured Streaming, and Lakeflow facilitate scalable and efficient data ingestion, spanning batch to streaming, including CDC and enterprise orchestration.

Session Details

Databricks is continuously evolving with new features that significantly simplify data processing and data loading.
In this session, I will demonstrate how to build a modern data ingestion process using built-in capabilities such as Auto Loader, Spark Declarative Pipelines (SDP), Structured Streaming, and Lakeflow.

I will walk through different usage scenarios for Auto Loader, SDP, and beyond, from classic batch processing to micro-batch and fully streaming data ingestion with Structured Streaming, including change data capture (CDC) use cases. I will also show that streaming does not have to mean real-time only, but can be effectively used to optimize traditional batch data loading processes.

In the second part of the session, we will explore data pipeline orchestration using Jobs & Pipelnies in Databricks.

Finally, I will demonstrate how to combine all these components into a scalable, and easy-to-maintain metadata-deriven ingestion framework, ready for use in an enterprise environment.

3 things you'll get out of this session

- A clear understanding of the building data ingestion process and integration with Databricks - Practical guidance on choosing and using Apache Spark, Auto Loader, Spark Declarative Pipelines to build a data processing based on batch processing, Change Data Capture and Streaming - Real-world tips for combining Databricks features into a scalable Framework