Real World Production Data Pipelines with Apache Spark

this session will help understand the use cases and best practices for working with Apache Spark in either Azure Databricks or Synapse Pools. Along the way, we'll cover operational areas like DevOps, alerting and monitoring, performance tuning, and managing costs.

You’ve decided to build your first Apache Spark solution, or perhaps you have already built out a proof of concept. Maybe you already have a significant background using Spark on-premises and are looking to migrate to the cloud. How can you ensure that the solution you deliver will be easy to extensible, easy to maintain, and efficient and secure? In this session, we will look at many of the best practices for designing robust production data pipelines in the cloud using spark. Regardless of whether you are using Azure Synapse Analytics or Azure Databricks this session will help understand the use cases and best practices for working with Apache Spark. Along the way, we'll cover operational areas like DevOps, alerting and monitoring, performance tuning, and managing costs.

Speaker

Jason Horner's Sessions

A Head First dive into Azure Synapse Analytics Data Explorer Pools

Azure SQL Monitoring Fundamentals

Azure Synapse Analytics Picking the Right Pool

Event Driven ETL With Synapse Pipelines

Join The Spark Side: Spark Sql

Real World Production Data Pipelines with Apache Spark

The Waffle Cutter's Guide to KQL

Azure Data Lake Design Patterns

Data Lake Design Patterns

Dimensional Modeling: Beyond the Basics