Spark Unplugged: How In-Process Analytics Is Making Distributed Computing An Expensive Investment
Proposed session for SQLBits 2026TL; DR
What if the most powerful analytics engine is already on your laptop? We've been told serious data work demands distributed computing and complex pipelines – but tools like DuckDB and Polars are proving otherwise. Learn how in-process engines can outperform Spark, slash complexity, be leveraged on Microsoft Fabric and leave you wondering: have we been overengineering all along?
Session Details
The data landscape is experiencing a fundamental shift. For years, we've been told that serious data work demands distributed computing, cloud infrastructure, and complex pipelines. But what if the most powerful analytics engine is already sitting on your desk?
In this provocative, insight-packed session, we'll challenge the conventional wisdom around big data processing by exploring the emerging "data singularity" - the point where single-node computing power is outpacing the growth of most analytical datasets. We'll demonstrate how tools like DuckDB and Polars are revolutionizing analytics by bringing analytical capabilities directly into your application process, eliminating overhead, delivering mind-blowing performance and turbo charging your "inner development loop".
You'll learn how these in-process engines can process millions of rows on your laptop, often outperforming distributed systems like Spark while dramatically reducing complexity, cost, and carbon footprint. We'll share practical code examples showing how to implement these tools in your workflows, with special focus on integrating into your Databricks or Microsoft Fabric environment.
This session is perfect for:
* Data engineers tired of the overhead incurred working with distributed systems
* Data scientists seeking faster iteration cycles
* Data leaders looking to minimise total cost of ownership and accelerate time to value
* Anyone interested in the future direction of data processing
Walk away with a completely fresh perspective on data architecture, practical techniques to implement tomorrow, and perhaps a nagging question: Have we been overengineering our data solutions all along?
In this provocative, insight-packed session, we'll challenge the conventional wisdom around big data processing by exploring the emerging "data singularity" - the point where single-node computing power is outpacing the growth of most analytical datasets. We'll demonstrate how tools like DuckDB and Polars are revolutionizing analytics by bringing analytical capabilities directly into your application process, eliminating overhead, delivering mind-blowing performance and turbo charging your "inner development loop".
You'll learn how these in-process engines can process millions of rows on your laptop, often outperforming distributed systems like Spark while dramatically reducing complexity, cost, and carbon footprint. We'll share practical code examples showing how to implement these tools in your workflows, with special focus on integrating into your Databricks or Microsoft Fabric environment.
This session is perfect for:
* Data engineers tired of the overhead incurred working with distributed systems
* Data scientists seeking faster iteration cycles
* Data leaders looking to minimise total cost of ownership and accelerate time to value
* Anyone interested in the future direction of data processing
Walk away with a completely fresh perspective on data architecture, practical techniques to implement tomorrow, and perhaps a nagging question: Have we been overengineering our data solutions all along?
3 things you'll get out of this session
To make people aware of emerging technologies that can increase velocity, reduce costs and minimise carbon impact. A live coding, demo heavy session which will show the path from local development to deployment on Microsoft Fabric.
Speakers
Barry Smart's other proposed sessions for 2026
No-Compromise Data Apps: Why Streamlit is the Missing Piece in Your Analytics Stack - 2026
Barry Smart's previous sessions
Microsoft Fabric and Data Mesh - a perfect fit?
Unlock the full potential of Microsoft Fabric by establishing domain-orientated ownership and federated computational governance to deliver high impact data products in your organisation.
Turbo charge your Data Science workflow with Microsoft Fabric
Microsoft Fabric brings a range of powerful capabilities into one SaaS platform, empowering "full stack" data scientists to achieve more. Learn how new tools such as MLflow, Semantic Link can be leveraged to transform the way that we develop machine learning models.
How to create a high performance data team: lessons learned from the field
Barry will describe the key factors in establishing a high performance data and analytics team within an organisation.