SQLBits 2012
Temporal Snapshot Fact Table
Snapshots without snapshots...is that possible? Take a "Classic" snapshot fact table, add some temporal data theory and you'll get a new fact table than can store snapshot data without doing snapshots. A life saver when you have a lot of data.
You are designing a BI Solution and your customer ask you to keep a snapshot of the
status of all their documents (orders, insurances, contracts, bills...whatever
the word "document" may mean) for all the days of the year. They have millions of documents
and they want to have in their Data Warehouse all the data they have gathered
right from the very first operating day.
If you have 1 million of documents (on average) and you have to keep a snapshot of them for each one of the 365 days in a year, and you have 10 year of history, you're going to have a 3 billions table just to start with. That's a very big and challenging number, and you may have not the option to buy a Parallel Data Warehouse.
In this session, we'll see how we can turn the usual snapshot tables into temporal table so that we can store time intervals in order to avoid data duplication, while keeping the Data Warehouse design usable by Analysis Services (that doesn't know what an interval is) and optimizing it to have very good performance even on standard hardware.
The explained technique is a result of several month of research and has been applied to the Data Warehouse of an insurance company where we had to deal with two times the number said before.
If you have 1 million of documents (on average) and you have to keep a snapshot of them for each one of the 365 days in a year, and you have 10 year of history, you're going to have a 3 billions table just to start with. That's a very big and challenging number, and you may have not the option to buy a Parallel Data Warehouse.
In this session, we'll see how we can turn the usual snapshot tables into temporal table so that we can store time intervals in order to avoid data duplication, while keeping the Data Warehouse design usable by Analysis Services (that doesn't know what an interval is) and optimizing it to have very good performance even on standard hardware.
The explained technique is a result of several month of research and has been applied to the Data Warehouse of an insurance company where we had to deal with two times the number said before.