Beyond Monitoring: Spark Optimization in Microsoft Fabric
Proposed session for SQLBits 2026TL; DR
This session presents an open-source Spark monitoring solution from the Fabric Toolbox, covering its motivation, architecture, and design. Learn how it enables real Spark monitoring, performance optimization, and right-sizing of clusters in Microsoft Fabric.
Session Details
In this session, I will walk through the open-source Spark monitoring solution available in the Fabric Toolbox
(https://github.com/microsoft/fabric-toolbox/tree/main/monitoring/fabric-spark-monitoring
).
The session will cover the motivation behind building this solution and the gaps it aims to address in existing Spark observability approaches within Microsoft Fabric. I will explain the overall architecture and design choices, showing how the solution collects, processes, and visualizes Spark runtime and performance metrics.
Beyond the technical implementation, the focus will be on how this OSS solution enables practical, real-world Spark monitoring. Attendees will learn how it can be used to:
Gain deeper visibility into Spark job and cluster behavior
Identify performance bottlenecks and inefficiencies
Support workload optimization and tuning
Enable informed right-sizing decisions for Spark clusters, improving both performance and cost efficiency
By the end of the session, participants will understand not only how the solution was built, but also how they can adopt, extend, and use it effectively to improve Spark performance and operational excellence in their own environments.
(https://github.com/microsoft/fabric-toolbox/tree/main/monitoring/fabric-spark-monitoring
).
The session will cover the motivation behind building this solution and the gaps it aims to address in existing Spark observability approaches within Microsoft Fabric. I will explain the overall architecture and design choices, showing how the solution collects, processes, and visualizes Spark runtime and performance metrics.
Beyond the technical implementation, the focus will be on how this OSS solution enables practical, real-world Spark monitoring. Attendees will learn how it can be used to:
Gain deeper visibility into Spark job and cluster behavior
Identify performance bottlenecks and inefficiencies
Support workload optimization and tuning
Enable informed right-sizing decisions for Spark clusters, improving both performance and cost efficiency
By the end of the session, participants will understand not only how the solution was built, but also how they can adopt, extend, and use it effectively to improve Spark performance and operational excellence in their own environments.
3 things you'll get out of this session
Beyond the technical implementation, the focus will be on how this OSS solution enables practical, real-world Spark monitoring. Attendees will learn how it can be used to:
Gain deeper visibility into Spark job and cluster behavior
Identify performance bottlenecks and inefficiencies
Support workload optimization and tuning
Enable informed right-sizing decisions for Spark clusters, improving both performance and cost efficiency
Speakers
Edgar Cotte's other proposed sessions for 2026
End-to-End Fabric Monitoring: Real-Time Signals, Logs, and Unified Observability Patterns - 2026
Fabric IQ for Power BI Users: From Semantic Models to Business Meaning (and AI Agents) - 2026
Fabric Real-Time Intelligence for Power BI Pros - 2026
OneLake Security - Centralized Data Security for Microsoft Fabric (Part 1) - 2026
OneLake Security - Centralized Data Security for Microsoft Fabric (Part 2) - 2026
Real-Time Capacity Intelligence: Monitoring, Alerting, and Analytics with Fabric Capacity Events - 2026
Unifying Spark, Real-Time Intelligence & Ontology: Practical Patterns for AI-Driven Agents in Fabric - 2026
What we learned from 100+ deployments of Real-Time Intelligence - 2026