SQL Server writes transaction logs sequentially. The size of transaction logs when using sql INSERT statement depends on the Recovery Model of the database and the type Transact-SQL statements used. Database recovery model can be – Simple (min), Bulk-Logged (intermediate) or Full (max). While this is true for Transact-SQL INSERT statements, records inserted through an SSIS Data Flow Destination (like SQL Server Destination) exhibit a different logging behaviour. This is because the INSERT statements issues from Integration Services engine is always minimally logged and it bypasses the recovery model of the database engine. This results in significant performance improvement by reducing the sequential disk writes, especially when dealing with huge recordsets which is typical in a data warehouse ETL scenario. That’s another reason why SSIS packages perform better over Transact-SQL Stored Procedures for ETL purposes.
Here is Proof
I performed a simple test by inserting 19,815 records from AdventureWorks Customer table to a copy table using a Transact-SQL Stored Procedure (shown below) and an SSIS Package (Screen Capture 1). The database was set to Simple recovery mode.
CREATE PROCEDURE spMinimal AS BEGIN DECLARE curCustomer CURSOR FOR SELECT [TerritoryID] ,[CustomerType] ,[rowguid] ,[ModifiedDate] FROM [AdventureWorks].[Sales].[Customer] DECLARE @intTerritoryID AS int DECLARE @charCustomerType AS nchar(1) DECLARE @rowguid AS sysname DECLARE @ModifiedDate as datetime truncate table Sales.Customer_TSQL1 OPEN curCustomer FETCH curCustomer INTO @intTerritoryID,@charCustomerType,@rowguid,@ModifiedDate WHILE @@FETCH_STATUS=0 BEGIN INSERT INTO [AdventureWorks].[Sales].[Customer_TSQL1] ([TerritoryID] ,[CustomerType] ,[rowguid] ,[ModifiedDate]) VALUES (@intTerritoryID, @charCustomerType, @rowguid, @ModifiedDate ) FETCH curCustomer INTO @intTerritoryID,@charCustomerType,@rowguid,@ModifiedDate END CLOSE curCustomer DEALLOCATE curCustomer END
The log information was captured after execution of the Transact-SQL Stored Procedure and SSIS Package using fn_dblog() as shown below:
SELECT DISTINCT operation,[log record length],count(1) [log count] FROM fn_dblog(null, null) WHERE allocunitname like '%Sales.Customer_TSQL1%' GROUP BY operation,[log record length] ORDER BY count(1) DESC, [log record length] DESC,operation
The results showed the transaction log size and the number of transaction log entries were significantly less for the inserts from SSIS packages
9 thoughts on “Row Insert from SSIS package Vs Transact-SQL Statements”
Good info, thanks
This provides an insight on the reason behind performance gain using SSIS, thanks. However I have two questions related to it,
– in your comparison you have used a cursor, what if you replace it by joins? In a more practical approach, you’d be playing with at least one temp table (if not more) where you’d perform some cleanings etc.
– If the SSIS engine bypasses those logging mechanism, it implies that records inserted from SSIS won’t be recoverable or say won’t be traceable through transaction log? Please comment.
Kazim, thanks for your comment. The objective of this post was to demonstrate the minimal logging capability of SSIS which is not available to ETL frameworks designed using T-SQL only. In most DW implementations the recovery mode is set to Simple mode to improve the speed of ETL. This is an important step considering the volume of transactions and the need to reduce latency for the availability of latest data for business analysis and reporting. Yes the transaction log would be minimal in this way. Having said that a well designed ETL framework should be designed to have the capability to reload from scratch or from any given date from the underlying source systems. It is for this reason the Datawarehouses have a simple backup and recovery model compared to transaction systems.
You didn’t answer the question properly, your t-sql statement uses a cursor. Why would you use a cursor? You are handicapping the t-sql???
How is this affected in SQL Server 2008 with minimally logged T-SQL statements?
For minimal logging to work in SQL Server 2008, the target table must be TABLOCK’ed and recovery mode has to be simple or bulk-logged, which most DW would satisfy.
INSERT INTO target_table WITH (TABLOCK) SELECT * FROM source_table
This is a very good feature, considering in SQL Server 2005 only SELECT INTO and Bulk Insert statements are minimally logged. I can see one drawback though, TABLOCK in insert statements cannot use multiple threads (Bulk Inserts are an exception). This would be sufficient for simple table extractions. It can however be bottle-neck when the ETL layer is designed for parallel runs where data from different sources are inserted into a partioned table. Also the tranformation layer in most ETL solutions are not simple inserts.