Sunday, October 3, 2010

Regression testing for Type 1 dimensions in the incremental warehouse

Let's say you have data warehouse that you do an incremental load on every few hours for real-time data. You have your ETL set to run in micro-batches every few hours and it works perfect. Life is nice. Now, you get a request from the business to add new columns to that Type 1 dimension.

You make changes to the ETL and while testing painstakingly check if the correct column(s) get updated. Well, you don't have to squint your eyes anymore. Checkout the almighty CHECKSUM function! SQL Server and other databases have a checksum function which generate a hash over the set of fields your specify. Run a checksum on the rows that you expect not to be updated and store it somewhere before running the ETL.






SELECT pkChecksum(mustnotupdatecolumn1, mustnotupdatecolumnn)
INTO   checksumsbeforeload 



Now run the ETL on your test batch and compare the values


SELECT pkChecksum(mustnotupdatecolumn1, mustnotupdatecolumnn)
INTO   checksumsafterload


You might also want to use this tactic when you are batch updating a table through ETL and expect that some columns should not change.

No comments:

Post a Comment