Spark Unplugged: How In-Process Analytics Is Making Distributed Computing An Expensive Investment

Regular 50 minute session for SQLBits 2026

TL; DR

What if the most powerful analytics engine is already on your laptop? We've been told serious data work demands distributed computing and complex pipelines – but tools like DuckDB and Polars are proving otherwise. Learn how in-process engines can outperform Spark, slash complexity, be leveraged on Microsoft Fabric and leave you wondering: have we been overengineering all along?

Session Details

The data landscape is experiencing a fundamental shift. For years, we've been told that serious data work demands distributed computing, cloud infrastructure, and complex pipelines. But what if the most powerful analytics engine is already sitting on your desk?

In this provocative, insight-packed session, we'll challenge the conventional wisdom around big data processing by exploring the emerging "data singularity" - the point where single-node computing power is outpacing the growth of most analytical datasets. We'll demonstrate how tools like DuckDB and Polars are revolutionizing analytics by bringing analytical capabilities directly into your application process, eliminating overhead, delivering mind-blowing performance and turbo charging your "inner development loop".

You'll learn how these in-process engines can process millions of rows on your laptop, often outperforming distributed systems like Spark while dramatically reducing complexity, cost, and carbon footprint. We'll share practical code examples showing how to implement these tools in your workflows, with special focus on integrating into your Databricks or Microsoft Fabric environment.

This session is perfect for:

* Data engineers tired of the overhead incurred working with distributed systems
* Data scientists seeking faster iteration cycles
* Data leaders looking to minimise total cost of ownership and accelerate time to value
* Anyone interested in the future direction of data processing

Walk away with a completely fresh perspective on data architecture, practical techniques to implement tomorrow, and perhaps a nagging question: Have we been overengineering our data solutions all along?

3 things you'll get out of this session

To make people aware of emerging technologies that can increase velocity, reduce costs and minimise carbon impact. A live coding, demo heavy session which will show the path from local development to deployment on Microsoft Fabric.