Building a Modern Data Warehouse in Azure
Description
You may have heard of the Modern Data Warehouse, you may have already tried out some of the components. In this session we will take a look at the various options available in Azure that enable you to build a reliable, modern, scaling data warehouse.
Storage vs Compute
We will start off with architecture - and the differences from traditional ways of thinking and working. This will include an introduction to Data Lakes and the various "Compute" tools that can use them.
Ingestion
How do you get your data from databases on prem or in cloud into a Data Lake? We get hands on with the tools and build a pipeline to copy data into our data lake.
Transformation
How do we transform data from source into a warehouse schema? Do I need Kimball anymore? What's Delta Lake? And what is this parquet format all about? We will answer these questions and look at the roles each has to play.
We will use Databricks, Data Flow & SQL to look at how we can do this, and will consider a UI based approach vs SQL vs Python vs Scala.
Once transformed we will load our data using polybase into Azure Synapse Analytics, as well as look at how we can query the data directly from our lake.
Presentation
Finally we will gain some insights from our data using Azure Synapse Analytics and PowerBI.
By the end of the day you should understand:
- How to build a data pipeline
- The Modern Data Warehouse components
- Compute vs Storage
- File formats (flat vs parquet/avro/orc vs delta)