Data Lakes & Parquet are a match made in heaven, but they’re cranked up to overdrive with the new features of Delta Lake. Available as the open source Delta Lake, or the premium Databricks Delta. This session will take a deeper look at why parquet is so good for analytics, but also highlight some of the problems that you’ll face when using immutable columnstore files.

We’ll then switch over to Databricks Delta, which takes parquet to the next level with a whole host of features – we’ll be looking at incremental merges, transactional consistency, temporal rollbacks, file optimisation and some deep and dirty performance tuning with partitioning and Z-ordering.

If you’re planning, currently building, or looking after a Data Lake with spark currently and want to get to the next level of performance and functionality, this session is for you. Never heard of parquet or delta? You’re about to learn a whole lot more!

Presented by Simon Whiteley at SQLBits XX