The Aggregate transformation applies aggregate operations such as maximum, minimum, average, sum, and count to column values and copies the results to the transformation output. Besides aggregate functions, the transformation provides the GROUP BY clause, which you can use to specify groups to aggregate across.
The Aggregate transformation supports the following operations:
|Group by||It divides datasets into groups. Columns of any data type can be used for grouping. For more details refer: SQL Group By|
|Sum||It sums the values in a column. Only columns with numeric data types can be summed. For more details refer: SQL Server SUM.|
|Average||It returns the average of the column values in a column. Only columns with numeric data types can be averaged. For more details refer: SQL Server AVG.|
|Count||It returns the number of items in a group. For more details refer: SQL Server Count.|
|Count distinct||It returns the number of unique nonnull values in a group.|
|Minimum||It returns the minimum value in a group. It can be used only with numeric, date, and time data types. For more details refer: SQL Server MIN.|
|Maximum||It returns the maximum value in a group. It can be used only with numeric, date, and time data types. For more details refer: SQL Server MAX.|
Note that, if any column contains a large numeric values or have any special precision requirements, then you can use IsBig property, which you can set on output columns to invoke special handling of big or high-precision numbers.
If a column value may exceed 4 billion or a precision beyond a float data type is required, IsBig should be set to 1.
Setting the IsBig property to 1 affects the output of the aggregation transformation in the following ways:
- The DT_R8 data type is used instead of the DT_R4 data type.
- Count results are stored as the DT_UI8 data type.
- Distinct count results are stored as the DT_UI4 data type.
Aggregate Transformation Basic Mode
As you can see here we have a sample table named SalesTrack, which is our source table. You can see table contains Total Sales by Executive and Region.
Now our requirement is to create a SSIS package which will perform an aggregation on Source data and loads the Region by Total sales data into table SalesByRegion.
You can see, there is no record in table as of now.
Lets create a new SSIS project or create SSIS package in existing project as shown below.
Right click on SSIS packages folder then select new SSIS package.
After that give it an appropriate name as AggregateTransformationBasic.dtsx.
Lets go to ToolBox pane select Data Flow Task and drag into Control Flow.
Now double click on Data Flow Task, it navigates you to Data Flow view.
Next, go to Toolbox, under Other Sources select OLEDB Source component and drag into Data Flow.
After that double click on OLEDB Source component to configure the connection manager for OLEDB Source.
You will see a OLEDB Source Editor window opens, here you can select an existing connection manager, if you have already created or can create a new connection manager.
Lets click on on New.. button to create a new connection manager.
Once you click on New.. button, a Configure OLEDB Connection Manager window opens, just click on New.. button.
After that a Connection Manager window opens, just select your SQL Server Instance name, Authentication mode, and database name.
Also you can verify connection by clicking on Test Connection button.
Once you done with this just click on OK button.
After that you can see, a connection manager is created, just click on OK button.
Now you can see, the connection manager you created is populating in OLEDB connection manager dropdown box.
In Data access mode, leave it as default selected value as Table or view.
Next, select table name, also can see the data preview by clicking on preview.. button.
Now in Columns tab, you can rename output columns and remove any column by unchecking them.
After that in Error Output Tab, you can specified the behavior of the component in case of failure, you can specified any one option as follows:
- Ignore Failure: It ignores any failure while reading rows from source and the package will continue executing even any error occurred.
- Redirect Row: It redirects the failed rows to other component which is connected with the error precedence constraints.
- Fail component: It stops the execution of package in case of any failure.
We go with default selected value that is Fail component.
After that click on OK button.
Now connect the path, drag OLEDB Source output arrow with Aggregate transformation as shown below.
Now double click on Aggregate transformation to configure the properties to used to perform group by operations and to calculate aggregate values.
You will see a Aggregate Transformation Editor window opens, here in the lower pane, you need to select the required columns to perform Aggregate Transformation operations by checking them in Available Input Columns.
As we want to see the total sales by region for this just check the columns Region and Sales, once you check them you will see those column names will appear in the Input Column field located in the lower pane.
Following fields that you can see in Lower Pane as:
Select from the list of available input columns in OLE DB Source.
It allows you to give an alias for each column. The default is the name of the input column. You can choose any unique, descriptive name.
These are the options supported by Aggregate Transformation, already discussed above.
If you choose Group By, use the check boxes to control how the transformation performs the comparison.
Count Distinct Scale
Optionally specify the approximate number of distinct values that the aggregation can write. By default, the value of this option is Unspecified.
If both CountDistinctScale and CountDistinctKeys are specified, CountDistinctKeys takes precedence.
Count Distinct Keys
Optionally specify the exact number of distinct values that the aggregation can write. If both CountDistinctScale and CountDistinctKeys are specified, CountDistinctKeys takes precedence.
Now double click on OLE DB destination component, to configure the destination server to insert the data into destination table.
Once you double click on OLEDB destination component you will see a OLE DB Destination Editor window opens.
Here we will use an existing connection manager that we created earlier, after that we will select a destination table that is SalesByRegion.
In Mappings tab, you can map Available input columns with Available Destination columns.
In ErrorOutput tab, we go with default value Fail Component.
After that click on OK button.
Now our package is ready, lets execute it.
Right click on Data Flow container then select Execute Task from context menu as shown below.
You can see Package executed successfully. Lets quickly check the destination table.
You can see Region by total sales rows are inserted in table. As we have used operation Group By on Region and SUM on Sales, so it groups the sales data by Region.
So far we have seen how to use SUM aggregation, similarly you can use other operations such as Maximum, Minimum, Count, Average so on as per you requirement.