Bulk Insert option for SSIS ADO.NET Destination which is available since SQL Server 2008 R2 improves data load speeds significantly. This option is enabled on the ADO.NET Destination component by selecting the “Use Bulk Insert when possible” check-box Continue Reading
SSIS: ETL for tables with Columnstore Index
Since SQL Server 2014, Tables with Columnstore Index are no longer read only and are updatable. This post is only applicable to SQL Server 2012.
Columnstore Index on fact tables provide significant improvements in response time especially for aggregated queries. One of the first things I did after migrating my databases to SQL Server 2012 was to create Columnstore Index on fact tables. Now I can run aggregated queries that fetch results in reasonable response a time which is great for development, unit testing and helps testers to run their test scripts faster.
One caveat to be aware of is that the table on which a Columnstore Index is defined cannot be inserted, updated or deleted (at the time of writing this post). Continue Reading
SSIS: Design Tip to Cascade Logging
An easy way to completely enable/disable SSIS Logging would be to set the LoggingMode property of the package using a Package Variable or Parent Package variable. LoggingMode takes the following enumerated values:
SSIS: Making the Most of sysssislog
The default SSIS log provider for SQL Server captures package trace in sysssislog table. It does capture some useful but limited trace information about the package execution. This trace is of little use without user defined variable values like record count, status flags, control metadata, selection from and to dates etc.
Of course SSIS provides a framework to write custom log providers to handle specific situations like these. At the time of writing this post, the custom log providers are unable to access the package variables. The good news is significant improvements are anticipated in SSIS logging capabilities in next version of SQL Server.
For those of you on SQL Server 2008 R2 and below, you could still customize SSIS log by creating your own execution log table and using it in tandem with sysssislog. Your execution log table could be structured something as shown below. Note that the execution log table captures record count, status flags, control metadata, selection from and to dates etc.
SSIS: Optimizing the ETL for Late Arriving Dimensions
Late Arriving Dimension/Early Arriving Fact as the name implies is a dimension record which is not available when the ETL loads the related fact record but becomes available at a later time.
This scenario could happen for a variety of reasons;
- It might be a perfectly valid business process. For e.g. in an emergency ward, the patient interventions could be recorded first as priority while the patient details that is collected could be recorded into the system at a later time.
- Sometimes due to operational reasons, the data might arrive at different times from their respective source systems. For e.g. The dimensions could be extracted from Master Data repository while the facts are extracted directly from the transaction system at different periods.
- It might be due a failed ETL run.
DFLD: How to capture/refresh Dataflow column metadata from multiple packages using Dataflow Discoverer?
Dataflow Discoverer (DFLD) is a command-line discovery utility that detects and writes the metadata of SSIS dataflow columns to a SQL Server lookup table. DFLD detects Dataflows nested within containers up to any level of nesting. For information on how to install and use DFLD utility,visit http://dfld.codeplex.com
DFLD works on one package at a time. It’s a typical ops requirement to keep the DFLMetadata of all SSIS packages up to date. This can be automated by creating a SSIS package similar to the one shown below Continue Reading
SSIS Dataflow Discoverer (DFLD)
What is Dataflow Discoverer?
Dataflow Discoverer (DFLD) is a command-line discovery utility that detects and writes the metadata of SSIS dataflow columns to a SQL Server lookup table. DFLD detects Dataflows nested within containers up to any level of nesting.
Why use Dataflow Discoverer?
SSIS dataflow column metadata generated by this utility could be used to enhance the error reporting capability of your SSIS package. For example to get the name of ErrorColumn from the Error Outputs which is not available by default in SSIS yet.
What are the minimum system requirements to use Dataflow Discoverer?
- .NET framework 4 or above
- SQL Server Database (any version)
How to install Dataflow Discoverer?
Verify that your system meets minimum requirements, then
Download sql script from here – http://dfld.codeplex.com/releases/60334/download/205239
Execute the script on your SQL Server. By default the sql script will create a SQL Server database – ETLMetadata and a lookup table – DFLMetadata. You could also create the table DFLMetadata in any other SQL Server database, in that case just comment out the database creation segment of the sql script.
Download DFLD executable from here http://dfld.codeplex.com/releases/60334/download/205238
Copy the executable to a local folder. It is recommended to add the executable path to System Path environment variable.
Using SSIS Pivot Transform When Pivot Column is Not Available
Pivoting is typically used in DW solutions to load multi-valued dimensions. In an earlier post, I demonstrated how SSIS Pivot component works and the advantages of using SSIS Pivot component over its equivalent T-SQL Pivot statement. In this post, I explain a technique to Pivot data when Pivot Column is not explicitly available. Continue Reading
SSIS Pivot Transform vs T-SQL Pivot
Pivot transformation converts multiple rows of data into a single record by using an input column that has a finite set of distinct values. Pivoting is typically used in DW solutions to handle multi-valued dimensions. In SQL Server, Pivoting is done using either
- T-SQL Pivot statement or
- SSIS Pivot Transformation
There are two significant benefits that stand-out in favour of using SSIS Pivot Transformation for ETL scenarios. The remainder of this post explains just that. Continue Reading
Alternatives to SSIS SCD Wizard Component
SSIS comes with an out-of-box SCD Wizard to handle Type 1 and Type 2 Slowly Changing Dimensions (SCD) which is a fundamental ETL requirement. However the SCD wizard component has some serious drawbacks – both from operational and functional perspectives that make it unusable for practical purposes. A good summary on the shortcomings of SCD Wizard component can be found here
Several workarounds have evolved over time and in this post I would like explore the different alternative options to handle Type 1 and Type 2 SCD without using the out-of-box SCD Wizard Component.Continue Reading