Practical lessons in optimizing Data Engineering with Spark - Part 2
Proposed session for SQLBits 2026TL; DR
Part 2: Optimizing Spark in large-scale Fabric deployments. Learn to maximize execution efficiency through Spark Job Definitions, Livy API, and Native Execution Engine. Demo-centric session showcasing real-world trade-offs and optimization strategies.
Session Details
Based on Luke's experience in working with some of the largest Fabric deployment, this session walks practical tips for optimizing data engineering focused Fabric deployments. This session is very demo-centric to provide real world examples of the tradeoffs that occur and how simple changes can have a large impact - especially as a deployment scales.
In part 2 we focus on optimizing Spark and in particular focus on:
1 - orchestrating for scale and minimizing start up time
2 - walkthrough and demo the use of Spark Job Definitions and Livy API as a way to optimize for certain, developer-centric experiences.
3 - explain how notebookutils can be used to unlock scenarios
4 - evaluate how native execution engine works, and when it will aid your workloads.
In part 2 we focus on optimizing Spark and in particular focus on:
1 - orchestrating for scale and minimizing start up time
2 - walkthrough and demo the use of Spark Job Definitions and Livy API as a way to optimize for certain, developer-centric experiences.
3 - explain how notebookutils can be used to unlock scenarios
4 - evaluate how native execution engine works, and when it will aid your workloads.
3 things you'll get out of this session
1 - detailed understanding of the different ways to run Spark and the trade-offs
2 - knowledge on best practices for orchestrating Spark at scale for performance.
3 - understanding of how native execution engine works and when you'd want to use and when you might need to be careful.
Speakers
Luke Moloney's other proposed sessions for 2026
Airflow in Fabric - an introduction informed by experience - 2026
Fabric for ISVs - Develop multi-tenancy apps on Fabric - 2026
Practical lessons in optimizing Data Engineering with Spark - Part 1 - 2026
Luke Moloney's previous sessions
Flying High with Data Engineering in Microsoft Fabric
In this session demo centric session we will walk through Data Engineering in Microsoft including: Getting started with Notebooks, Learning how Fabric Lakehouse's provides flexibility to use the tools you prefer, The flexibility of data engineering in Spark - including Cluster Configuration, Environments and Library Management and finally, How Copilot enables all developers to be more efficient with Spark.
Harnessing Data Science and AI in Fabric
In this session you will learn about all the latest data science developments in Microsoft Fabric. We will show you how Fabric supports the end-to-end data science workflow and how we plan to evolve these capabilities. We will also look at exciting new AI capabilities including Copilot experiences and more!
Synapse Q&A with PG
SQLBits' has brought members of the Azure Synapse Program Group to Wales to ask YOUR questions. Bring your questions for: Dedicated SQL Pools, Serverless SQL Pools, Spark Pools, Pipelines, Kusto and everything else Synapse related.
The fundamentals of building a lakehouse with Synapse
The fundamentals of building a lakehouse with Synapse