Data Quality Validations in Fabric Spark

Regular 50 minute session for SQLBits 2026

TL; DR

When building a Lakehouse in Fabric Spark (or any other analytics data store), ensuring data quality is crucial. There are many aspects to check and various methods to do so. In this session we'll cover how to perform data quality checks using built-in functions in PySpark, how to create reusable functions for data quality checks, and how to use Python modules for data quality validation. When we look at Python modules we will focus on using Great Expectations module for data quality validations. Additionally, we will discuss the new Fabric feature, materialized views in Spark, which includes built-in data quality checks.

Session Details

When building a Lakehouse in Fabric Spark (or any other analytics data store), ensuring data quality is crucial. There are many aspects to check and various methods to do so. In this session, we will start with the six dimensions of data quality as defined by DAMA (Data Management Association International): accuracy, completeness, consistency, timeliness, validity, and uniqueness. We will explore examples for each of these categories and examine different ways to ensure data quality in Spark within Fabric.

We will cover how to perform data quality checks using built-in functions in PySpark, how to create reusable functions for data quality checks, and how to use Python modules for data quality validation. When we look at Python modules we will focus on using Great Expectations module for data quality validations.

Additionally, we will discuss the new Fabric feature, materialized views in Spark, which includes built-in data quality checks.

By the end of this session, the audience will have a better understanding of the key considerations for data quality and the various methods to implement validations.

3 things you'll get out of this session

To help the audience understand the benefit of automatic quality checks
To help the audience understand the different method to do quality checks in Fabric Spark
To help the audience understand the pros and cons of the different methods to do quality checks in Fabric Spark

Ásgeir Gunnarsson's other proposed sessions for 2026

Best Practices for Sharing Power BI Content with External Users - 2026

Find the Spark as a SQL data warehouse developer - 2026

From Chaos to Control: Orchestrating Lakehouse Workloads in Microsoft Fabric - 2026

From Chaos to Control: Orchestrating Lakehouse Workloads in Microsoft Fabric Part 1 - 2026

From Chaos to Control: Orchestrating Lakehouse Workloads in Microsoft Fabric Part 2 - 2026

Panel Debate: Real-World Microsoft Fabric Administration - Lessons from the Trenches - 2026

Workspace strategy for Lakehouse/Warehouse in Microsoft Fabric - 2026

Ásgeir Gunnarsson's previous sessions

Power BI Governance quick start

For many the mention of governance gives them images of massive effort and inconvenience for themselves and end-users. It´s a hindrance and waste of time. But it doesn´t have to be that way. In this session we will talk about how we can start to tackle Power BI governance. We will look at few things you can get started with very quickly that will get you far in setting up your governance strategy. We will look at things such as documentation, roles and monitoring. The hope is that you go back to your organization with a clear picture of where to start and a better feeling on governance.

5 Power Automate flows that could give immediate value in your organization

In this session we will demonstrate how to create value fast with Power Automate. We will do this by showing 5 types of flows that can give value with little development effort

Managing Power BI workspaces

This session will help you find the best strategy for managing your Power BI workspaces. Should you go for as much automation as possible or do careful manual process? Should you have few people controlling everything or should everyone do as they want? All this and more you will find in this session.

Whats Going On in My Power BI Environment?

As Power BI is a self-service tool, it can be hard for administrators to monitor it. Power BI is fast improving in this context but there still isn’t a consistent way of monitoring it.

Power BI Governance overview

Governance of your Power BI environment is very important. Setting up structure around it will allow developers (IT or business) to develop Power BI content the right way, first time as well as aid administrators

Impact of weather on English Premier League in Power BI

By using open data and web scraping in Power BI we will examine if weather has an impact on games in the English Premier League. We will look at the advantages and disadvantages of using open data and web page data how Power BI works with it.