SQLBits 2022

Hitting Top Speeds with Spark SQL in Azure Synapse

Tips and tricks for getting the best performance out of Spark SQL on Synapse Analytics while keeping costs under control.
Azure Synapse Analytics workspace is a one stop shop for big data processing. Azure Synapse truly packs a punch by providing multiple processing engine options to users – the highly scalable Apache Spark distributed computing framework, the industry’s leading data warehouse on the cloud – SQL Server and the massively parallel telemetry data exploration tool called Azure Data Explorer.

Although Azure Synapse provides these leading big data processing engines, from a performance point of view, it’s still critical for developers and users to follow the best practices and guidelines to minimize the cost incurred per query. These guidelines will not only help take full advantage of the processing capabilities available underneath, but also ensure that the cost of ownership of the platform stays within limits.
In this presentation, we will be talking about some common misgivings about the Spark engine in relation to performance and scalability. We will also be providing a few tips and tricks for tuning Spark SQL queries to hit top speeds in performance & scalability.

Feedback LInk - https://sqlb.it/?7329