Dimension Modeling: The Different Types of Unknown SK

Dimension Tables usually have a member with Surrogate Key (SK) value of -1 to handle unknown members. Unknown SK is usually set when SSIS look-up on a dimension table fails to return a match based on a business/natural key.

Unknown SKs helps business in the following ways

  • Enables business to have visibility to the fact that there are imperfect records. Transaction based reporting usually don’t report these records primarily due to inner joins and filter conditions in reporting queries. When a Data warehouse/BI system is first delivered, no wonder business users are surprised to find information under the unknown bucket.
  • A better understanding of this imperfect data usually triggers process/procedures for better data capture, better data validation at source and better data cleansing rules to keep this figure low thereby contributing to data quality.

Given that several factors contribute to unknown SKs, it would be beneficial to customize the unknown SK into different types as shown below rather than clubbing into a generic -1 value

SK Type Definition
-1 UNAVAILABLE The source system is expected to supply this data but not available in this instance. This might be because of an issue with data extracts or could be just a late arriving dimension record.For e.g.

  • Patient details not available on an emergency record. It might arrive later when the patient’s identity has been verified.
  • Doctor performing surgery not available on the surgery record due to bad extract!!!
-2 NOT APPLICABLE The source system is expected to supply this data only under certain conditions. In this case, there is a perfectly valid business reason why this data will not be available from source system.For e.g.

  • cancellation date is not available if the order is not cancelled
  • there is no sales agent if the transaction was an internet sale
-3 INVALID The source system has supplied data but this data does not conform to reference data used in Data warehouse. Usually this is either a case of bad data or a reference data that is not yet mapped to the Master Data Management (MDM) system.For e.g.

  • Transaction date is wrongly keyed in as 11/12/1010 instead of 11/12/2010.
  • Address match return a blank.

Usually a combination of Null/Not Null check along with status checks is all required in ETL to handle unknown SKs. The ETL effort required is pretty minimal compared to the benefits it offers to business users.

Related Posts

Unclogging the Fact Table Surrogate Key Pipeline

3 thoughts on “Dimension Modeling: The Different Types of Unknown SK

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s