SQLBits 2022

Leveraging Apache Spark for Efficient Data Encryption at Scale

This session will discuss a custom solution we have built to solve the problem of row level encryption in a highly complex data lake.
Whilst data encryption is not a new concept at the file level, the ability to encrypt data at the row level in a table has massive benefits in terms of data security and control. In responding to the challenges of large-scale, row-level data encryption we have built Gecko: an efficient, auditable, and simple encryption ecosystem designed for Spark and Delta Lake.

Gecko has allowed us to simultaneously achieve the following benefits within our data platform:
– Automatically handle data deletion.
– Increase the overall security of PII data in our data lake.
– Maintain Non-PII data structure, in order to continue to provide analytical value and overall data integrity.
– Make PII data accessible when required.

This presentation will share:
– The core concepts behind the ecosystem.
– How Spark & Delta lake have been leveraged in these applications.
– Why these technologies have been essential in achieving the necessary requirements.

Feedback Link - https://sqlb.it/?6983