22-25 April 2026
Video unavailable
SQLBits 2022

Real World Production Data Pipelines with Apache Spark

this session will help understand the use cases and best practices for working with Apache Spark in either Azure Databricks or Synapse Pools. Along the way, we'll cover operational areas like DevOps, alerting and monitoring, performance tuning, and managing costs.
You’ve decided to build your first Apache Spark solution, or perhaps you have already built out a proof of concept. Maybe you already have a significant background using Spark on-premises and are looking to migrate to the cloud. How can you ensure that the solution you deliver will be easy to extensible, easy to maintain, and efficient and secure? In this session, we will look at many of the best practices for designing robust production data pipelines in the cloud using spark. Regardless of whether you are using Azure Synapse Analytics or Azure Databricks this session will help understand the use cases and best practices for working with Apache Spark. Along the way, we'll cover operational areas like DevOps, alerting and monitoring, performance tuning, and managing costs.