22-25 April 2026
Video unavailable
SQLBits 2026

Migrating the Mammoth

Based on a real enterprise case, this session details how a world-leading tech company modernized a massive, highly fragmented data platform—migrating RC, ORC, Parquet, and Avro datasets to Delta Lake as part of a multi-petabyte transformation. Attendees will learn how the team designed the architecture, automation framework, and migration tooling needed to orchestrate thousands of pipelines, validate data at scale, and handle complex edge cases such as multi-format tables, massive (>50 TB) datasets, and long-running legacy workloads. Discover the patterns, challenges, and hard-earned lessons required to execute a migration of this magnitude successfully.
This session is grounded in the real-world experience of a global technology company undertaking one of the largest data-platform migrations in industry—modernizing its fragmented data estate and moving to Delta Lake at extraordinary scale. Managing more than 60 PB of data spread across four different formats, 750+ ingestion pipelines, and over 5,000 ETL jobs, the organization needed a migration strategy engineered for both massive volume and operational complexity.

We will explore how the team executed this transformation using a structured “divide and conquer” model, separating the work into targeted migration workstreams and aligning them to OKRs. As tooling sophistication grew, migration velocity increased, supported by an agile delivery model and a dedicated “Migration Machine” designed to validate data, orchestrate dependency-heavy workloads, and provide complete visibility through custom dashboards.

Attendees will gain insight into the real challenges encountered at scale, including migrating 50+ TB tables, dealing with multiple source formats, orchestrating batch backfills, validating large datasets, and designing alternate paths for tables too slow or complex for standard migration flows.

We will conclude with the tangible outcomes: up to 1600× improvement on highly selective queries for petabyte-scale tables, significant gains in table-read performance, and major acceleration of BI workloads—results internal teams described as truly “game changing”.

This session offers a deeply practical look at what it takes to migrate a “mammoth” data estate to a modern lakehouse architecture, providing actionable lessons for organizations planning or scaling their own modernization efforts.