SQLBits 2020

Performance Optimization with Azure Databricks

Azure Databricks has become one of the staples of big data processing. See how to make the most of it by understanding how Spark works under the covers.

Azure Databricks has become one of the staples of big data processing. Based on Apache Spark, which was considered the "swiss army knife of big data", Azure Databricks is mainstream now in data processing from complex transformations, machine learning to integrated Azure activities within multi-stage processing pipelines.
In this talk Richard Conway breaks down how Apache Spark and Azure Databricks work. Through a series of demos and examples illustrating how workloads can be optimised through partitioning, predicate push-downs can be seamlessly built with parquet statistics, shuffling and sorting can be minimized, working with data sampling, caching using Databricks Delta and much more.