22-25 April 2026
Video unavailable
SQLBits 2022

Metadata Driven Pipelines with Python and Databricks

In this demo filled session we'll take a journey from data pipelines built with notebooks to building scaleable, metadata driven pipelines using python functions.
In this demo filled session we'll take a journey, starting with some simple data transforms in a notebook. - We'll look at what a Spark transformation function actually does, and how to build our own transformation functions. - We'll see how combining generic functions with some metadata can allow us to perform common data engineering tasks such as data cleansing and validation using less code, in a more testable way. - Finally we will see just how simple it is to deploy these functions into Databricks and get them to production. Attendees should come away with some ideas of how to build more scalable, metadata driven data pipelines using Databricks and Spark.