22-25 April 2026
SQLBits 2020

SQL Server Big Data Clusters: The Full Story

A complete overview about SQL Server Big Data Clusters including it's major components and use-cases!

In the realm of data storage and processing, there are two major technologies which we deal with every day. On one side, we have relational data that is stored inside SQL Server, and on the other side, non-relational or very large datasets that do not fit the relational model which are stored on big data clusters like Hadoop or Spark. This introduces challenges when having to combine datasets across both these technologies. SQL Server was never built to process huge datasets in a distributed fashion or to handle non-relational data very well, meaning that in many cases you would have to resort to bringing your relational data into Hadoop or Spark clusters. SQL Server 2019 has the answer with Big Data Clusters: it combines SQL Server with HDFS and Spark! In this session we are going to explore the capabilities of the exciting new feature. How does it work and how can we work with datasets that are non-relational?


What you need to know first: (01:00)
Big Data Cluster Architecture: (11:01)
Demo: Data Virtualization: (21:22)
Demo: Storage Pool: (27:05)
Demo: End to End Real World Scenario: (32:32)
Managing a Big Data Cluster: (41:22)

Speakers

Ben Weissman (he/him)

Ben Weissman (he/him)'s previous sessions

Flight of Innovation: Enhancing On-Prem and Multi-Cloud SQL Instances with a Cloud Cockpit
Learn how Azure Arc extends Azure's comprehensive feature set to your entire SQL estate, irrespective of location or platform.
 
Fueling Your Future: Embracing the Rewards of Data Platform Community Involvement
In this session, we delve into the power of active participation in the data platform community. Through speaking, volunteering, blogging, mentoring, organizing, or other ways, individuals not only give back but also get rewarded immensely, both, personal and professional.
 
Be more responsible around AI - Less Bias, More Ethics
This session is focused on the importance of ethical considerations in the development and deployment of artificial intelligence (AI) systems.
 
Azure Arc in 50 Minutes
Join me on a journey to understand what Azure Arc is and how it allows you to manage your hybrid and multi-cloud estates.
 
Never ETL again, thanks to Synapse Link. Really?
Let's take a look at Synapse Link - it's capabilities and restrictions! Synapse Link is Microsoft's new solution to push your OLTP data in near-real time to Azure Synapse for analytical purposes.
 
Keynote by The Community
Ben and Rob have found some wonderful folk to actually do the important parts of the community keynote. on the theme of How to be a nonpassive member of the data community
 
Why is understanding Kubernetes important for your career as a Data Professional?
A panel discussion on why both - admins and developers - should understand the impacts and benefits of containerized applications.
 
Azure Arc-enabled SQL MI – More than just another kind of SQL Server
Join us, as we explore the capabilities of Azure Arc-enabled SQL Managed Instances.
 
(Almost) all about Azure Arc - in 20 Minutes
Join me on a journey through the different offerings in Azure Arc and how they can benefit you as a data professional!
 
SQL Server Big Data Clusters: The Full Story
A complete overview about SQL Server Big Data Clusters including it's major components and use-cases!
 
The Self-Tuning SSIS Package
This session is about using the Business Intelligence Markup Language (Biml) to monitor and control your orchestration patterns. By automatically analyzing the results in ETL logs, we’ll be able to automate our staging orchestration!