The Merge transformation in SSIS combines two datasets into a single dataset.
It takes two datasets and must be in sorted order, so the output of merge transformation would be a combined dataset in sorted order. The rows from each dataset are inserted into the output based on values in their key columns.
The Merge transformation is similar to the Union All transformations as it combines the datasets into single dataset.
But the difference is that Union ALL can have multiple inputs means more than two datasets in unsorted order, and it does not follow any particular order, so the combined output of Union all transformation does not need to be sorted.
Lets see the step by step implementation of Merge Transformation in SSIS that will take two input source dataset and combines them into single output and insert into destination SQL Server table.
As you can see, here we have two SQL tables named TableA, and TableB. These tables are our source.
TableA contains 5 records while TableB contains 4 records.
And our destination are SQL table named TableC, and you can see it does not have any records yet.
Open SSDT (SQL Server Data Tools), create a new SSIS project or create a new SSIS package in existing SSIS project.
Here we are creating a new package in existing SSIS project, right click on SSIS package folder then select New SSIS Packages.
Now go to ToolBox pane, select Data Flow task and drag into Control Flow View.
Now double click on Data Flow Task component, that will navigate you to Data Flow view for this selected Data Flow Task.
Now in Other Sources, select OLEDB Source and drag into Data Flow view as shown below.
Now double click on OLEDB Source component and assign a connection manager for OLEDB source.
After that, you will see a OLEDB Source Editor window opens, here you can select an existing connection manager, if you have already created else you can create a new connection manager.
Lets click on New.. button to create a new connection manager.
Once you click on New.. button, a Configure OLEDB Connection Manager window opens.
After that, just click on New.. button.
You will see a Connection Manger window opens, specify server name, authentication mode and database name.
You can also verify the connection by clicking on Test connection button.
Once you done with this, just click on OK button.
Now you can see, a data connections is created, next click on OK button.
Once you click on OK button, you get back to very first window OLEDB Source Editor.
In Connection Manager tab, you can see a connection manager that you created is auto selected in OLEDB connection Manager dropdown.
Leave a Data access mode as table or view as default.
Next, select a table TableA from dropdown select name of the table or view.
In Columns tab, you can rename output column name, or remove column by unchecking them from Available External Columns.
After that in Error Output Tab, you can specified the behavior of the component in case of failure, you can specified any one option as follows:
- Ignore Failure: It ignores any failure while reading rows from source and the package will continue executing even any error occurred.
- Redirect Row: It redirects the failed rows to other component which is connected with the error precedence constraints.
- Fail component: It stops the execution of package in case of any failure.
Here we go with default selected value that is Fail component.
Next click on OK button.
Once you click on OK button, you will see a OLEDB connection manager configured successfully.
Lets right click on it and select rename and give it a name OLEDB Data Source 1.
Lets take another OLEDB Source, just go to Other Sources then select OLEDB Source and drag it into Data Flow view.
Now double Click on OLEDB Source component and assign a connection manager for this.
You will see a same window OLEDB Source Editor opens, but this time you will notice a OLEDB connection manger dropdown list is auto filled with an existing connection manager, and this is the same connection manager that you created OLEDB Source earlier.
As both TableA, and TableB are available in same SQL Server instance and database, so we can use an existing connection connection manger for this OLEDB source but here you need to select TableB in Select name of table or view dropdown.
Next, follow the same steps for Columns and ErrorOutput Tab, after that click on OK button.
In case, if you want to create a new connection manager for this OLEDB source, might be your are using different SQL Server instance or database then you can click on New.. button and follow the same step as we already did earlier.
Next, give it name as OLEDB Source 2.
So far, we have successfully configured two input Datasets.
Next, we will take Sort transformation component, as Merge transformation takes sorted input dataset only.
The Sort transformation sorts input data in ascending or descending order and copies the sorted data to the transformation output.
In Toolbox pane, under Common, select Sort and drag it two times into Data flow view, as we need sort data of OLEDB Source 1 and OLEDB Source 2.
After that give them name Sort1 and Sort2.
Next, connect OLEDB Source 1 with Sort1 and OLEDB Source 2 with Sort 2.
For this, Just drag the arrow from OLEBD Source 1 and connect with Sort 1, same do with OLEDB source 2 and connect the arrow with Sort 2 as shown below.
Now double click on Sort 1 component to specify the column to sort, and their sort type and sort order.
Lets sort the rows by Gender in ascending order and Country in descending order.
Just select the Gender and Country column from Available Input columns, to change the sorting type you can select sorting type either ascending or descending from Sort Type dropdown.
If the Remove rows with duplicate sort values option is checked then sort creates a single entry for all duplicates rows, based on the specified string comparison options.
If unchecked then transformation copies duplicate rows to the transformation output.
After that just click OK button.
Now double click on Sort 2 and repeat the same steps as we did for Sort 1.
Once you done with this, it will look like as shown below.
Now we have sorted the input datasets, now we will take a Merge transformation which combines these two sorted datasets into a single dataset.
Lets drag the Merge component into Data Flow view
Now connect the Sort1 arrow with Merge.
Once you connect Sort 1 with Merge, immediately a Input Output Selection window opens.
Select Merge Input 1 from dropdown.
Then click on OK button.
Now do the same step with Sort 2, connect arrow to Merge. But this time Merge input 2 is selected automatically by the SSIS.
Now we need to insert merge transformation output into Destination SQL Server table, for this lets OLE DB destination into Data Flow and configure it to destination SQL Server.
Now double click on OLEDB Destination to configure it.
You can see, our destination table is TableC, so the merge transformation data will be inserted into this table.
After that click on OK button.
Now package is ready, time to run the package.
Lets right click on Data flow view, and select Execute Task from context menu.
And you can see, package run successfully.
You can also notice that, there are 5 rows are moved from OLEDB source1 and 4 rows are moved from OLEDB Source2, then the rows are sorted by Sort 1 and Sort 2 then the sorted rows are Merged and finally 9 rows are loaded to OLEDB Destination.
Lets quickly verify the destination table TableC, and you can see there are total 9 rows in TableC that is 5 rows from TableA, and 4 rows from TableB.
Also rows are sorted by Gender in ascending order and Country in descending order.
557 total views, 6 views today