22-25 April 2026

A scalable dataplatform starts with reusable code: Metadata to the Rescue

Proposed session for SQLBits 2026

TL; DR

In this session, I show how to build a metadata driven data platform that adapts automatically to new sources and tables. You will learn how to use metadata to create reusable pipelines, reduce duplication and scale ingestion across tools like Databricks, Azure Data Factory and Microsoft Fabric.

Session Details

Two new tables? No problem. A new data source? Sure. But why does it always feel like you have to rebuild your entire pipeline from scratch?

If you find yourself copy-pasting code just to ingest slightly different datasets, it’s time for a smarter approach.

In this session, you’ll learn how to design a metadata-driven data platform that adapts dynamically—no hardcoded paths, file formats, or load strategies. Just clean, reusable code guided by metadata that tells your pipeline or notebook what to load, from where, and how.

We’ll cover:
- How to structure and store metadata (like source type, load method, and schedule)
- How to build pipelines or notebooks that use this metadata at runtime
- How this metadata-driven approach works seamlessly across different tools—whether you’re using Databricks, Azure Data Factory, Microsoft Fabric, or any other platform.
- How this approach saves time, reduces duplication, and makes onboarding new sources effortless

Maintenance becomes a lot easier and scaling your platform is easy because your ingestion is metadata driven.

Want to save time and keep your code clean and clear when building pipelines? This session’s for anyone working on data platforms.

3 things you'll get out of this session

- A clear understanding of how metadata driven ingestion eliminates hardcoded pipelines - Practical patterns to build reusable, scalable pipelines or notebooks across data platforms - Concrete techniques to reduce maintenance effort and onboard new data sources faster