Operations
Advanced Data Engineering with Databricks on the Lakehouse
Description
In this session, you will build upon existing knowledge of Apache Spark™, Structured Streaming and Delta Lake to unlock the full potential of the lakehouse by utilising the suite of tools provided by Databricks. This session places a heavy emphasis on designs favouring incremental data processing, enabling systems optimised to continuously ingest and analyse ever-growing data. The topics in this course helps learners to work towards the Databricks Certified Data Engineer Professional exam.
Learning Objectives
• Design databases and pipelines optimized for the Databricks Lakehouse Platform
• Implement efficient incremental data processing to validate and enrich data-driven business decisions and applications
• Leverage Databricks-native features for managing access to sensitive data and fulfilling right-to-be-forgotten requests
• Manage error troubleshooting, code promotion, task orchestration and production job monitoring using Databricks tools
Previous Experience
Not all of the below experience are required, but 3/5 is recommended
- Experience using PySpark APIs to perform advanced data transformations
- Experience using SQL in production data warehouse or data lake implementations
- Experience working in Databricks notebooks and configuring clusters
- Familiarity with creating and manipulating data in Delta Lake tables with SQL
- Ability to use Spark Structured Streaming to incrementally read from a Delta table
Tech Covered
Data Bricks, Operations