Azure Databricks: Engineering Vs Data Science

Have you looked at Azure DataBricks yet? No! Then you need to. Why you ask, there are many reasons. The number 1, knowing how to use Apache Spark will earn you more money. It is that simple. Data Engineers and Data Scientists who know Apace Spark are in-demand! This workshop is designed to introduce you to the skills required to do both.

In the morning we will introduce Azure DataBricks then discuss how to develop in-memory elastic scale data engineering pipelines. We will talk about shaping and cleaning data, the languages, notebooks, ways of working, design patterns and how to get the best performance. You will build an engineering pipeline with Python (Or possibly some other stuff we are not allowed to tell you about yet). The Engineering element will be delivered by UK MVP Simon Whiteley. Simon has been deploying engineering projects with Azure DataBricks since it was announced. He has real world experience in multiple environments.

Then we will shift gears, we will take the data we moved and cleansed and apply distributed machine learning at scale. We will train a model and productionise it. We will then enrich our data with our newly predicted values. The Data Science element will be led by UK MVP Terry McCann. Terry holds an MSc in Data Science and has been working with Apache Spark for the last 5 years. He is dedicated to applying engineering practices to data science to make model development, training and scoring as easy an as automated as possible

By the end of the day, you will understand how Azure Databricks supports both data engineering and data science, levering Apace Spark to deliver blisteringly fast data pipelines and distributed machine learning models. Bring your laptop as this will be hands on. 

Pre-requisites
An understanding of ETL processing either ETL or ELT on either on-premises or in a big data environment. A basic level of Machine Learning would also be beneficial, but not critical.
Laptop Required:Yes

  • Software: In the session we will be using Azure Databricks. We will have labs and demos that you can follow if you want to. If you do want to then you will need the following: - An Azure Subscription - Money on the Azure Subscription - Enough access on the subscription to make service principals. - Azure Storage explorer- PowerShell
  • Subscriptions: Azure
Thursday 28 February 2019