Dimension Tables usually have a member with Surrogate Key (SK) value of -1 to handle unknown members. Unknown SK is usually set when SSIS look-up on a dimension table fails to return a match based on a business/natural key.
Unknown SKs helps business in the following ways
- Enables business to have visibility to the fact that there are imperfect records. Transaction based reporting usually don’t report these records primarily due to inner joins and filter conditions in reporting queries. When a Data warehouse/BI system is first delivered, no wonder business users are surprised to find information under the unknown bucket.
- A better understanding of this imperfect data usually triggers process/procedures for better data capture, better data validation at source and better data cleansing rules to keep this figure low thereby contributing to data quality.
Given that several factors contribute to unknown SKs, it would be beneficial to customize the unknown SK into different types as shown below rather than clubbing into a generic -1 value
|-1||UNAVAILABLE||The source system is expected to supply this data but not available in this instance. This might be because of an issue with data extracts or could be just a late arriving dimension record.For e.g.
|-2||NOT APPLICABLE||The source system is expected to supply this data only under certain conditions. In this case, there is a perfectly valid business reason why this data will not be available from source system.For e.g.
|-3||INVALID||The source system has supplied data but this data does not conform to reference data used in Data warehouse. Usually this is either a case of bad data or a reference data that is not yet mapped to the Master Data Management (MDM) system.For e.g.
Usually a combination of Null/Not Null check along with status checks is all required in ETL to handle unknown SKs. The ETL effort required is pretty minimal compared to the benefits it offers to business users.
3 thoughts on “Dimension Modeling: The Different Types of Unknown SK”