Alternatives to SSIS SCD Wizard Component

SSIS comes with an out-of-box SCD Wizard to handle Type 1 and Type 2 Slowly Changing Dimensions (SCD) which is a fundamental ETL requirement. However the SCD wizard component has some serious drawbacks – both from operational and functional perspectives that make it unusable for practical purposes. A good summary on the shortcomings of SCD Wizard component can be found here

Several workarounds have evolved over time and in this post I would like explore the different alternative options to handle Type 1 and Type 2 SCD without using the out-of-box SCD Wizard Component.  

Option 1: Open Source Components

One of the best open source components available out there is SSIS Dimension Merge SCD Component ( formerly known as Kimball Method SSIS Slowly Changing Dimension Component) . This component can be downloaded from CodePlex.

Option 2: SQL Server MERGE Statement

T-SQL MERGE statement first introduced in SQL Server 2008 does a pretty good job in handling Type 1 and Type 2 SCD. Oracle also has a MERGE equivalent. A good post on usage of MERGE statement for SCD can be found in Kimball Group.

Option 3: Use Available SSIS Components

If your organization policy prevents you from using open source products and if you are using earlier versions of SQL Server (SQL Server 2005 and below) then Options 1 &2 is not for you. In that case you can use available SSIS components to handle your SCD logic as illustrated by the following examples.

Type 1 SCD

Type 1 SCD can be addressed by using a simple combination of Lookup and Conditional Split components in the Dataflow. Screen Capture 1 shows a simple implementation of Type 1 SCD.

Here you can see

1. The source is a query from Product table of AdventureWorks. The query has a checksum column on the changing attributes which will be used downstream to determine whether the Dim record needs to be updated with this transaction record.

<br>
SELECT ProductNumber,ModifiedDate, Name,StandardCost, Color,ListPrice,Size, Weight,<br>
HASHBYTES('SHA1',Name+ CAST(StandardCost&nbsp;AS VARCHAR(20))+ ISNULL(Color,'') +CAST(ListPrice&nbsp;AS VARCHAR(20))+ISNULL(Size,'')+ CAST(ISNULL(Weight,0) AS VARCHAR(20))) As SRC_ChangingAttributeChecksum<br>
FROM SalesLT.Product AS src<br>

2. The Lookup component queries the DW DimProduct table using a SQL query. Similar to the source, a checksum column is created on the changing attributes of the Dim table that corresponds to the source columns.

<br>
SELECT ProductAlternateKey,<br>
HASHBYTES('SHA1',EnglishProductName+ CAST(ISNULL(StandardCost,0) AS VARCHAR(20)) + ISNULL(Color,'') +CAST(ISNULL(ListPrice,0) AS VARCHAR(20))+ISNULL(Size,'')+ CAST(ISNULL(Weight,0) AS VARCHAR(20))) As LKUP_ChangingAttributeChecksum<br>
FROM dbo.DimProduct<br>

Screen Capture 1 - Type 1 SCD
Screen Capture 1 – Type 1 SCD

3. The source records that do not match the Dim records on the unique key are new records and inserted to the Dim table.

4. For the source records that do match the Dim records, only the source records that are modified must be identified. This is done by the Conditional Split column that checks the checksum of the source record with checksum of the lookup record. If the checksums match, then it means the source and Dim record is the same and hence the source record can be discarded.

5. If the checksums differ, then the dim record gets updated by the OLEDB SQL component with the current value from source.

Type 2 SCD

An approach similar to Type 1 SCD can be used to address Type 2 SCD using a combination of Lookup and Conditional Split components by making simple adjustments to handle historical attributes (in addition to changing attributes) and ageing. Screen Capture 2 shows a simple implementation of Type 2 SCD.

Screen Capture 2 - Type 2 SCD
Screen Capture 2 – Type 2 SCD

Here you can see

1. Both the source and lookup query contain checksums on the both the Changing AND Historical attributes.

2. The source records that do not match the Dim records on the unique key are inserted to the Dim table as before.

3. The Conditional Split component determines whether there are changes to checksum of Historical and Changing attributes of existing Dim records from the source.

Screen Capture 3 – Historical and Changing Attributes Checksum Comparison

4. The source records that differ on the Changing Attributes are updated by the OLEDB SQL Component.

5. The source records that differ on the Historical attributes are first aged and End Dated by the OLEDB SQL Component and then the current record is inserted as a new Dim record.

Of course this option is a middle ground between Options 1 or 2 and SCD Wizard Component, but is definitely more robust than the SCD Wizard Component and does the trick especially with small projects when you are dealing with few specialized packages.

Download SSIS Package – Alternative to SCD Wizard Component

34 thoughts on “Alternatives to SSIS SCD Wizard Component

  1. Hi benny,
    excelent post, i need exactly the same thing for my project.
    please may you sent me the link for download.
    the link above is broken

    thank in advance

    Stephane

  2. Nice Article but i could not download the package. Can you please send it to my email want to see how you did the historical type2

    1. Zed, there is link at the end of the post (not the one in comments section) that should work. Did you try that ? If it doesn’t work let me know and I’ll email the package

  3. Hi Benny
    First, thanks for your posts and for this one, in particular!
    I have some comments regarding Massimo’s post:
    Actually, I believe his thought was somewhat correct. Replacing NULL by a value is quite dangerous in that case as our lookup opponent can have the replacing value but not NULL in the same position. As a result, we are going to get the same hash whereas different members have been provided. If the column which values we are going to compare, has no check constraints on it then it could theoretically lead to a mistake. In case it has any check constraints, we could simply use the value that is out of the domain to replace NULL?
    What do you think of it?

    Thanks,
    BorkaS

      1. Thanks Benny for your valuable responses!
        I really appreciate your effort to help us!

        Good luck,
        BorkaS

  4. Hi Austin, You saved my day I was using the same approach as “Use Available SSIS Components” but got struck. I was using muticast and to update the historical records and Inserting at the same time.
    Thanks a lot
    Tinku

    1. If your dim has few attributes you could compare the columns directly without using hashbytes. Since hashbytes always evaluate to a fixed length, they are extremely useful when your dimension has a number of attributes that must be checked for changes. And if you decide to materialize checksum as columns hashbytes could also serve as hash index and since they are skinny index they performs better.

  5. Nice post!!. I’d recommend Option 1, as Kimball SCD Codeplex component is fast enough as it manages multiple threads automatically.

  6. pardon me I am a starter, my question is why are we even creating a checksum, can we simply add the columns?

    Regards,

    richard.

    1. Richard, you could do that if there are few columns. Checksum is always of fixed length, so you wouldn’t have truncation issues when dealing with large number of columns. And if you materialize checksum as columns they could also serve as hash index (skinny index performs better)

      1. I seem not to get your sample package on sky drive. Is there any other location that you can provide it. Thanks

      2. Just updated post with download URL. Scroll down to the end of the post to get the URL

  7. Benny ,

    Its a wonderful post.
    But how do i get these values
    HistoricalAttributeCheckSum, LKUP.HistoricalAttributeCheckSum, ChangingAttributeCheckSum,
    LKUP.ChangingAttributeCheckSum.

    I am new to ssis and DataWarehouse.
    Please let me know thanks.

    Tinku

  8. Great post, Benny. One question… The order of the 2 expressions in the conditional split looks like it would cause a record with different (lookup vs source) changing and historical attributes to be routed to the changing attribute path. Should this be reversed?
    Thanks,
    Paul

    1. Paul, you are absolutely right. Historical Attributes take precedance over Changing Attributes. Thanks for bringing that to my attention. The post now has the correct screen capture. Cheers

  9. […] How to create type 1 & 2 SCD’s using standard SSIS components (other than the SCD) (at the bottom of the post) – Benny Austin […]
    +1

    1. It does not work. Look at this sample:

      DECLARE @STR AS VARCHAR(1024)
      DECLARE @TEST1 AS VARCHAR(50)
      DECLARE @TEST2 AS VARCHAR(50)
      DECLARE @TEST3 AS VARCHAR(50)
      SET @TEST1 = ”
      SET @TEST2 = ‘MASSIMO’
      SET @TEST3 = ”

      SET @STR = CONVERT(VARCHAR(1024), ISNULL(@TEST1, ”)) +
      CONVERT(VARCHAR(1024), ISNULL(@TEST2, ”)) +
      CONVERT(VARCHAR(1024), ISNULL(@TEST3, ”))

      SELECT HASHBYTES(‘SHA1’, @STR)

      SET @TEST1 = ‘MASSIMO’
      SET @TEST2 = ”
      SET @TEST3 = ”

      SET @STR = CONVERT(VARCHAR(1024), ISNULL(@TEST1, ”)) +
      CONVERT(VARCHAR(1024), ISNULL(@TEST2, ”)) +
      CONVERT(VARCHAR(1024), ISNULL(@TEST3, ”))

      SELECT HASHBYTES(‘SHA1’, @STR)

      Checksum value is the same but @TEST1 and @TEST2 have different value.

      1. Massimo, it does work if the NULL values are handled appropriately. In your example, the string input to checksum function is evaluating to the same value and that’s why the checksums are same. If the NULL values are handled as shown below you will get a different checksum

        DECLARE @STR AS VARCHAR(1024)
        DECLARE @TEST1 AS VARCHAR(50)
        DECLARE @TEST2 AS VARCHAR(50)
        DECLARE @TEST3 AS VARCHAR(50)
        SET @TEST1 = NULL
        SET @TEST2 = ‘MASSIMO’
        SET @TEST3 = NULL

        SET @STR = CONVERT(VARCHAR(1024), ISNULL(@TEST1, ‘N.A’)) +
        CONVERT(VARCHAR(1024), ISNULL(@TEST2, ‘N.A’)) +
        CONVERT(VARCHAR(1024), ISNULL(@TEST3, ‘N.A’))

        SELECT HASHBYTES(‘SHA1’, @STR) –0x87244FEA54DBEFBAFA312E212D16669DD52C429B

        SET @TEST1 = ‘MASSIMO’
        SET @TEST2 = NULL
        SET @TEST3 = NULL

        SET @STR = CONVERT(VARCHAR(1024), ISNULL(@TEST1, ‘N.A’)) +
        CONVERT(VARCHAR(1024), ISNULL(@TEST2, ‘N.A’)) +
        CONVERT(VARCHAR(1024), ISNULL(@TEST3, ‘N.A’))

        SELECT HASHBYTES(‘SHA1’, @STR) –0xD9E3F10DCACAB31186FC80D56AFEAFE86F5B832F

Leave a reply to Benny Austin Cancel reply