Data Warehousing: To Primary Key or not to Primary Key?

Constraints are a necessary part of the tug-and-thrust of database design, but once in a while, when designing a data warehouse, someone comes up with the question “Do I need to map those constraints to my DW?” The simple answer is “Yes”, but it’s also a somewhat naive answer that doesn’t take into account the needs of a DW.

For your primary key, consider using a surrogate or alternate key; you’ll need to cater for slowly changing dimensions, e.g. if you’re doing a report over the last 5 years on average sales per married/unmarried salesperson, you’ll want to register the fact that someone was unmarried for 2 years, then married for the last 3. This means your data warehouse will have two dimension-table rows for the same person. Following the OLTP structure for that will be hard, to say the least. Note that it’s not enough to create a composite key on the original primary key and e.g. marital status — the same person may also have other fields that change occasionally, such as address (zipcode), department, job title, or even gender!

As for other constraints such as default value or check constraints, we need to consider how a DW is populated, and what the arrival of data to the DW means. The DW is invariably populated from an OLTP (relational) system, the same system that typically has all the original constraints. Ideally, this means the data is already clean by the time the DW gets it. It also means that the performance hit required by checking these constraints in writing to the database has already been taken. And OLTP system is typically optimised for writes and reads, particularly the large tables that change frequently and are the subject of most of the storage in a DW.

On the other hand, Data Warehouses are massively optimised for reads (assuming you’re populating as a batch, with reasonably high latency), and constraints don’t really factor in read operations. You can typically get around any constraint issues with your DW populating job, if they haven’t already been dealt with in the OLTP system, and deal with nulls etc. at the analysis/reporting tools if necessary. It’s far more important to make sure default values fit with your conceptual data model, and don’t introduce issues at the DW client tools.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s