In the realm of data storage and processing, there are two major technologies which we deal with every day. On one side, we have relational data that is stored inside SQL Server, and on the other side, non-relational or very large datasets that do not fit the relational model which are stored on big data clusters like Hadoop or Spark. This introduces challenges when having to combine datasets across both these technologies. SQL Server was never built to process huge datasets in a distributed fashion or to handle non-relational data very well, meaning that in many cases you would have to resort to bringing your relational data into Hadoop or Spark clusters. SQL Server 2019 has the answer with Big Data Clusters: it combines SQL Server with HDFS and Spark! In this session we are going to explore the capabilities of the exciting new feature. How does it work and how can we work with datasets that are non-relational?
Presented by Ben Weissman at SQLBits XX