22-25 April 2026

Centralise or Federate - How to Scale your Lakehouse

Proposed session for SQLBits 2026

TL; DR

In this session, I’ll explore the trade-offs between centralised and federated Data Lakehouse architectures, how to decide what fits your organisation, and why the best solutions often blend both approaches. You’ll learn how governance, data ownership, training, and tooling choices can make or break your strategy. Attendees will walk away with an understanding of the benefits of federation, and a practical roadmap for balancing autonomy with standardisation in a real-world Data Lakehouse.

Session Details

As organisations modernise their data platforms, one key question always surfaces: How do we scale this to serve the business?

Should you maintain a central Data Lakehouse for consistency and control, or federate it to empower teams and scale with agility?

In this session, I’ll explore the trade-offs between centralised and federated Lakehouse architectures, how to decide what fits your organisation, and why the best solutions often blend both approaches. You’ll learn how governance, data ownership, training, and tooling choices can make or break your strategy. Attendees will walk away with an understanding of the benefits of federation, and a practical roadmap for balancing autonomy with standardisation in a real-world Data Lakehouse.

3 things you'll get out of this session

- Understand what a federated data mesh is - Understand the challenges of federation over centralisation - Understand the benefits of federation

Speakers

Craig Porteous

craigporteous.com

Craig Porteous's previous sessions

Zero to Lakehouse in Microsoft Fabric
Fabric is Microsoft's unified software as a service data platform, built around a Data Lakehouse architecture. In this session I'll share an array of data Lakehouse architecture patterns, and demonstrate how you can build a full Data Lakehouse platform without ever touching an Azure resource.
 
Building a Lakehouse on the Microsoft Intelligent Data Platform
This session session aims to give you that context. We'll look at how spark-based engines work and how we can use them within Synapse Analytics. We'll dig into Delta, the underlying file format that enables the Lakehouse, and take a tour of how the Synapse compute engines interact with it. Finally, we'll draw out our whole Lakehouse architecture
 
Designing Data Architectures that InfoSec will actually approve
In this session I'll guide you from through a secure reference architecture with Data Factory, Databricks, Data Lake, and Azure Synapse, working together as a secure, fully productionised platform. Each has their own idiosyncrasies, but this session will teach you the options available and the pitfalls to avoid.
 
Why the Lakehouse?
In this session I'll cover what the Data Lakehouse architecture is, where it fits against existing architectures like a data warehouse, and why you should build one. We'll also cover the underlying technology options to arm you with all of the information you need to plan your next data platform.
 
Keynote by The Community
Ben and Rob have found some wonderful folk to actually do the important parts of the community keynote. on the theme of How to be a nonpassive member of the data community