What is Dataflow Discoverer?
Dataflow Discoverer (DFLD) is a command-line discovery utility that detects and writes the metadata of SSIS dataflow columns to a SQL Server lookup table. DFLD detects Dataflows nested within containers up to any level of nesting.
Why use Dataflow Discoverer?
SSIS dataflow column metadata generated by this utility could be used to enhance the error reporting capability of your SSIS package. For example to get the name of ErrorColumn from the Error Outputs which is not available by default in SSIS yet.
What are the minimum system requirements to use Dataflow Discoverer?
- .NET framework 4 or above
- SQL Server Database (any version)
How to install Dataflow Discoverer?
Verify that your system meets minimum requirements, then
-
Download sql script from here – http://dfld.codeplex.com/releases/60334/download/205239
Execute the script on your SQL Server. By default the sql script will create a SQL Server database – ETLMetadata and a lookup table – DFLMetadata. You could also create the table DFLMetadata in any other SQL Server database, in that case just comment out the database creation segment of the sql script.
-
Download DFLD executable from here http://dfld.codeplex.com/releases/60334/download/205238
Copy the executable to a local folder. It is recommended to add the executable path to System Path environment variable.
How to use Dataflow Discoverer?
From the command prompt, type:
dfld /param1:value1 /param2:value2 /param3:value3 … /paramN:valueN
List of Parameters:
Parameter | Shorthand | Mandatory | Description |
/package | /p | Y | Fully qualified path of SSIS Package in File System. If the path contains spaces, enclose in double quotes. |
/server | /s | Y | IP Address/Name of Database Server hosting the DFLMetadata table. If Instance Name is applicable, then specify as Database Server/Instance Name. |
/database | /d | Y | Database Name hosting the DFLMetadata table. This is usually ETLMetadata, unless you have deployed the table DFLMetadata to a different database. |
/user | /u | N | Database User Name for SQL authentication. This parameter is not required for Windows authentication. |
/pwd | /pwd | N | Password of Database User required for SQL authentication. This parameter is not required for Windows authentication. |
If any parameter contains space, then enclose in double quotes. From command prompt, type dfld /? for more help options.
Examples:
- Database uses windows authentication
dfld /p:C:\users\eagle\populateDim.dtsx /s:localhost /d:ETLmetadata
- Database uses SQL server authentication
dfld /p:C:\users\eagle\populateRef.dtsx /s:localhost /d:ETLmetadata /u:userxyz /pwd:XXXXXXX
- Package Name contains space
dfld /p:”C:\users\eagle\populate fact.dtsx” /s:localhost /d:ETLmetadata
On successful completion, your console will display statistics similar to the one shown below
Screen Capture 1 – Dataflow Discoverer Console Output
What metadata is captured by Dataflow Discoverer?
Metadata | Data Type | Description |
PackageID | Varchar(38) | Unique GUID of the package |
PackageName | Varchar(200) | Recommended to keep package name under 200 characters |
DataflowID | Varchar(38) | GUID of the Dataflow |
DataflowName | Varchar(200) | Recommended to keep dataflow name under 200 characters |
IOType | Char(1) | Flag to indicate whether the collection is Input or Output. I= Input Collection, O= Output Collection |
IOName | Varchar(200) | Name of Input/output collection |
IOID | Int | ID of Input/output collection |
ColumnName | Varchar(200) | Recommended to keep Column name under 200 characters |
ColumnDesc | Varchar(500) | |
LineageID | Int | It’s an integer value which identifies the column in the buffer |
ColumnID | Int | Unique value that identifies a column within a Dataflow in the package. |
ColumnDataType | Varchar(30) | |
ColumnPrecision | Int | |
ColumnScale | Int | |
UpstreamComponentName | Varchar(200) | |
SortKeyPosition | Int | |
ColumnCodePage | Int | |
MappedColumnID | Int | |
UsageType | Int | |
ColumnType | Int |
How to get name of ErrorColumn from the Error Outputs in my SSIS Package?
From your SSIS Dataflow, lookup DFLMetadata table to fetch the name of ErrorColumn. Check-out this post which demonstrates how this is done https://bennyaustin.wordpress.com/2011/02/04/ssis-name-of-errorcolumn/
At what stage in my ETL development cycle should I use Dataflow Discoverer?
You can generate the dataflow column metadata using DFLD at any stage in your SSIS package development cycle. It is recommended to refresh your dataflow column metadata just before deployment when your package is ready to go.
How to capture/refresh Dataflow column metadata from multiple packages using Dataflow Discoverer?
It’s a typical ops requirement to keep the DFLMetadata up to date. This can be automated by creating a SSIS package that scans your deployment folder for package (* .dtsx) files. For each package found, execute the DFLD using an Execute Process Task in the control flow. Check-out this post which demonstrates how this is done https://bennyaustin.wordpress.com/2011/02/05/refresh-dataflow-column-metadata/
Where could I find the documentation for the source code of Dataflow Discoverer?
Source code is available for download from http://dfld.codeplex.com/releases/view/60334. The source code is adequately documented to be self-explanatory.
I am unable to download the scripts been attached
While executing the DFLD on SQL Server 2014, we are getting the following error. The Visual Studio version is 2013.
Unhandled Exception: System.IO.FileNotFoundException: Could not load file or assembly ‘Microsoft.SqlServer.ManagedDTS, Version=11.0.0.0, Culture neutral, Public Keytoken=89845dcd8080cc91’ or one of its dependencies. The system cannot find the file specified.
at BA.SqlServer.Ssis.Dfl.CallDiscovere.Main(String[] args)
Any directions will be deeply appreciated.
Thanks
– Praveen
Praveen, you might have to install Client Tools SDK as this error is caused by dll’s that are referenced in the application but missing from the run-time environment
Thank you Benny. I really appreciate your immediate response. We will try this option and will let you know. If you have a URL handy from where I can download the Client Tools SDK, that will be great.
Benny, I have verified the installation. The Client Tools SDK for SQL 2014 is already installed. Any other thoughts?
Praveen, thanks for that confirmation.
Few other things you could check:
1. Ensure that the path environment variable has an entry for \Microsoft SQL Server\11*\SDK\Assemblies. If you are running multiple versions of SQL server pls be aware of the order of entries in path environment variable https://dfld.codeplex.com/discussions/373788
2. If the error still occurs, download the source code from codeplex and open the solution in Visual Studio. Delete the existing reference to Microsoft.SQLServer.ManagedDTS and add a new reference pointing to Microsoft.SQLServer.ManagedDTS.dll. Recompile and execute the new exe. Good luck
I am getting following error when I used Data Flow discoverer, can you please help?
ERROR: Connection string parameters of DFLMetadata table is invalid
Thanks
Raghib
There might be an issue with the connection parameters you are passing to dfld. You might want to refer to “How to use Dataflow Discoverer?” section of the post for instructions and examples.
Hi Benny,
Had a quick look at your Data Flow Discovery and from the initial review it does not capture the driver details involved in the DFT. Is this correct? From my particular usage that would be of interest. The situation is we want to identify any of our packages that have MS Attunity drivers and change them.
Thanks
Alan
Alan, you are right. DFLD does not capture the data source drivers used by the data flow task.