Data Quality Validations in Fabric Spark
Proposed session for SQLBits 2026TL; DR
When building a Lakehouse in Fabric Spark (or any other analytics data store), ensuring data quality is crucial. There are many aspects to check and various methods to do so.
In this session we'll cover how to perform data quality checks using built-in functions in PySpark, how to create reusable functions for data quality checks, and how to use Python modules for data quality validation. When we look at Python modules we will focus on using Great Expectations module for data quality validations.
Additionally, we will discuss the new Fabric feature, materialized views in Spark, which includes built-in data quality checks.
Session Details
When building a Lakehouse in Fabric Spark (or any other analytics data store), ensuring data quality is crucial. There are many aspects to check and various methods to do so. In this session, we will start with the six dimensions of data quality as defined by DAMA (Data Management Association International): accuracy, completeness, consistency, timeliness, validity, and uniqueness. We will explore examples for each of these categories and examine different ways to ensure data quality in Spark within Fabric.
We will cover how to perform data quality checks using built-in functions in PySpark, how to create reusable functions for data quality checks, and how to use Python modules for data quality validation. When we look at Python modules we will focus on using Great Expectations module for data quality validations.
Additionally, we will discuss the new Fabric feature, materialized views in Spark, which includes built-in data quality checks.
By the end of this session, the audience will have a better understanding of the key considerations for data quality and the various methods to implement validations.
We will cover how to perform data quality checks using built-in functions in PySpark, how to create reusable functions for data quality checks, and how to use Python modules for data quality validation. When we look at Python modules we will focus on using Great Expectations module for data quality validations.
Additionally, we will discuss the new Fabric feature, materialized views in Spark, which includes built-in data quality checks.
By the end of this session, the audience will have a better understanding of the key considerations for data quality and the various methods to implement validations.
3 things you'll get out of this session
To help the audience understand the benefit of automatic quality checks
To help the audience understand the different method to do quality checks in Fabric Spark
To help the audience understand the pros and cons of the different methods to do quality checks in Fabric Spark
Speakers
Ásgeir Gunnarsson's other proposed sessions for 2026
Best Practices for Sharing Power BI Content with External Users - 2026
Find the Spark as a SQL data warehouse developer - 2026
From Chaos to Control: Orchestrating Lakehouse Workloads in Microsoft Fabric - 2026
From Chaos to Control: Orchestrating Lakehouse Workloads in Microsoft Fabric Part 1 - 2026
From Chaos to Control: Orchestrating Lakehouse Workloads in Microsoft Fabric Part 2 - 2026
Workspace strategy for Lakehouse/Warehouse in Microsoft Fabric - 2026
Panel Debate: Real-World Microsoft Fabric Administration - Lessons from the Trenches - 2026