Kubernetes: From Bare Metal to SQL Server Big Data Clusters

Join Chris Adkin (a Kubernetes expert) and Buck Woody (Principal Data Scientist at Microsoft) for an intense, hands-on, lab-led session on setting up a production grade SQL Server 2019 big data cluster environment. Topics covered will include: hardware, virtualization, and Kubernetes, with a full deployment of SQL Server's Big Data Cluster on the environment that you will use in the class. You'll then walk through a set of Jupyter Notebooks in Azure Data Studio to run T-SQL, Spark, and Machine Learning workloads on the cluster.

You'll also receive valuable resources to learn more and go deeper on Linux, Containers, Kubernetes and SQL Server Big Data Clusters.

1. An introduction to Linux, Containers and Kubernetes

What are containers and why are they different to virtual machines? We’ll explain why standalone container engines will only get you so far, the need for container orchestration and where and why Kubernetes comes into the picture.

2. Hardware and Virtualization environment for Kubernetes
A production grade environment needs a solid foundation, should this be bare metal or a virtualized platform, and what about storage? This module will furnish attendees with the knowledge crucial to build a solid foundation for a production grade environment.

3. Kubernetes Deep-Dive and Hands-on
Covering topics that include: deploying Kubernetes, Kubernetes contexts, cluster troubleshooting and management, services: load balancing versus node ports, understanding storage from a Kubernetes perspective and making your cluster secure.

4. SQL Server Big Data Clusters Architecture
This module will dig deep into the anatomy of a big data cluster by covering topics that include: the data pool, storage pool, compute pool and cluster control plane, active directory integration, development versus production configurations and the tools required for deploying and managing a big data cluster.

5. Using the BDC for Data Science
Now that your big data cluster is up, it's ready for data science workloads. This Jupyter Notebook and Azure Data Studio based module will cover the use of python and PySpark, T-SQL and the execution of Spark and Machine Learning workloads.

Monday 1 January 1900