Video unavailable
SQLBits 2022
Building a Cross Cloud Data Protection Engine
How to build a Data Protection Engine with Databricks
Data Protection is still at the forefront of multiple companies minds with potential GDPR fines of up to 4% of their global annual turnover (creating a current theoretical max fine of $20bn). GDPR effects countries across the world, not just those in Europe, leaving many companies still playing catch up. Additional acts and legislation are coming into place such as CCPA meaning Data Protection is a constantly evolving landscape, with fines that can literally decimate some business.
In this session we will go through how we have worked with our customers to create an Azure and AWS implementation of a Data Protection Engine covering Protection, Detection, Re-Identification and Erasure of PII data. The solution is built with Security and Auditability at the center of the Architecture, with special consideration for managing a single application across two public clouds; leading us to using Databricks and Delta Lake.
We will deep dive into using Spark to create multiple techniques of Data Protection with emphasis in tokenized the data applying hash and salt technique with different kind of hash (md5, sha2…). Exploring how Delta Lake empowers us to share PII tokens between cloud providers with ACID transactions, auditing, and versioning of data and allowing us to run parallel executions at the same time thanks to blind inserts. We hope this session shows you that Data Protection doesn’t have to be an off the shelf black box, but you can own the risk and solution within your own platform, whilst still remaining secure and compliant.