SSIS: Optimizing the ETL for Late Arriving Dimensions


Late Arriving Dimension/Early Arriving Fact as the name implies is a dimension record which is not available when the ETL loads the related fact record but becomes available at a later time.

This scenario could happen for a variety of reasons;

  • It might be a perfectly valid business process. For e.g. in an emergency ward, the patient interventions could be recorded first as priority while the patient details that is collected could be recorded into the system at a later time.
  • Sometimes due to operational reasons, the data might arrive at different times from their respective source systems. For e.g. The dimensions could be extracted from Master Data repository while the facts are extracted directly from the transaction system at different periods.
  • It might be due a failed ETL run.

Continue Reading

Using SSIS Pivot Transform When Pivot Column is Not Available

Pivoting is typically used in DW solutions to load multi-valued dimensions. In an earlier post, I demonstrated how SSIS Pivot component works and the advantages of using SSIS Pivot component over its equivalent T-SQL Pivot statement. In this post, I explain a technique to Pivot data when Pivot Column is not explicitly available. Continue Reading

SSIS Pivot Transform vs T-SQL Pivot

Pivot transformation converts multiple rows of data into a single record by using an input column that has a finite set of distinct values. Pivoting is typically used in DW solutions to handle multi-valued dimensions. In SQL Server, Pivoting is done using either      

  1.  T-SQL Pivot statement or
  2.  SSIS Pivot Transformation

There are two significant benefits that stand-out in favour of using SSIS Pivot Transformation for ETL scenarios. The remainder of this post explains just that. Continue Reading

Alternatives to SSIS SCD Wizard Component

SSIS comes with an out-of-box SCD Wizard to handle Type 1 and Type 2 Slowly Changing Dimensions (SCD) which is a fundamental ETL requirement. However the SCD wizard component has some serious drawbacks – both from operational and functional perspectives that make it unusable for practical purposes. A good summary on the shortcomings of SCD Wizard component can be found here

Several workarounds have evolved over time and in this post I would like explore the different alternative options to handle Type 1 and Type 2 SCD without using the out-of-box SCD Wizard Component.   Continue Reading