Implementing a Pivot Transformation Using Data Flow Activity – Azure Data Factory
Using Azure Data Factory you can create a data flow for Pivot transformation. Pivot allows you to aggregate the results and rotate rows into columns, bascially transform data from row-level to columnar data.
It creates a multiple columns from the unique row values of a single column avaibale in your source data.
Before create a pivot transformation, first we need to prepare a source data, as you can see in below screenshot , we have already a blob file ItemDetails.txt uploaded in Azure Blob Storage container.
If you do not know, how to upload a file to blob storage, you can refer post – Upload File to Azure Storage Container.
As you can see below screenshot, the blob file which is our Source, or you can say input file that contains data for ID, Item, Item_Qty and Year.
Now our requirement is to perform pivot transformation on data in order to convert row data into columnar data means to create multiple columns from the unique row values of Year column, and values for those columns would be a total sum of Item Quantity for Individual Items.
At last the pivoted data is loaded to Target that is would be a SQL Table.
Implementing the pivot tansformation using Azure Data factory
Go to Azure Data factory Resource, If you do not have created Azure Data Factory, can refer post – Create An Azure Data Factory.
Here, we have already an Azure Data Factory named as oneazuredatafactory, In azure data factory page click on Author & Monitor tab to launch an Azure Data Factory Portal as shown in below screenshot.
After clicking, It navigates you to Azure Data Factory Portal(https://adf.azure.com/).
Now we will create a pipeline, a pipeline that is a logical grouping of activities that together perform a task. The activities in a pipeline define actions to perform on your data.
Click on Author button, now select Pipelines, then click on New PipeLine as shown below.
Now give a name to Pipeline, named as Load Pivot Data to SQL.
After that we will create a Data Flow also known as Mapping data flows, are visually designed data transformations in Azure Data Factory.
Before creating a Data flow first turn on the Data Flow debug mode as shown below.
When you turn on Data flow debug mode then you can interactively watch the data shape transform while you develop and debug your data flows.
Now, to create a data flow, Go to Activities, then expand Move & transform tab, Next select Data Flow and drag it into screen.
You will see Adding data flow screen appears, check on Create new data flow and select Data flow tab.
After that click on OK button as shown below.
Once you click on Ok button, It asks you to provide name for Dataflow and Source details.
Lets provide the details for Dataflow and Source as following.
Provide a name for source file as blobfile, Next Source type leave it as default, set to Dataset.
Now Select Dataset, as source file is blob storage so select a dataset Azure blob storage for source file, and click on Continue Button.
Once you click on Continue button, it asks for selecting format type for source data. As our source data file is .txt format so select DelimitedText format and click on Continue button.
After that a Set properties page appears, here you need to select your source file path, and add linked service, basically this linked service is a Azure blob storage linked service which refers to Azure blob Dataset.
If you do not know how to create linked service for blob storage you can refer a post – Azure data Factory Load blob Storage to SQL
Once you provide all details accordingly as shown below. Click on Ok Button.
After clicking, dataset details are completed. Now In source tab, you can configured other settings as per you requirement.
You also have an option to test connections to make sure that dataset is configured correctly with linked services.
If you want to see the preview of your source file data, Go to Data preview tab there you can see source file data as shown below.
Now we will Add Pivot transformation to Source, for this click on plus (+) button and select Pivot from lists as shown below.
Now you will see, a default selected pivot settings tab in you screen, For using pivot transformation you need to specify values to three inputs that is Group by , Pivot key, Pivoted columns.
Lets see provide values for each one by one .
Here you need to provide the name of columns to Group By, based on these columns data is aggregated for the columns that you will provide in pivoted columns, and the output data will group all rows with the same group by values into one row.
Now in pivot key here you will provide the column whose row values get pivoted into new columns.
We have selected a Year column for the Pivot key.
In the Value section you can enter specific row values to be pivoted, that means if you do not want to see all the Year in columns then you can provide specific Year in values. so Only the row values entered in this section will be pivoted.
If Null value is enabled that will create a pivoted column for the null values in the column.
For each unique pivot key value which are converted to columns, generates an aggregated row value for each group.
You can also create multiple columns per pivot key and each pivot column must contain at least one aggregate function.
Column name pattern It allows you to format the column name of each pivot column so the outputted column name will be a combination of the pivot key value, column prefix and optional prefix, suffice, middle characters.
Column arrangement It allows you to choose how you want the columns to be ordered, If you generate more than one pivot column per pivot key,
Column prefix This is an optional if you only have one pivoted column. In case, If you generate more than one pivot column per pivot key, you need to enter a column prefix for each column.
In the last box that is Pivot expression, Click on that. A visual expression builder page appears.
Here you need to select a values for Pivot Key columns, if you want sum of Item Quantity then provide
aggregate function sum on Item_Qty Column, also convert this column to integer if require.
After clicking on Save and finish button, You can see how pivoted data would look like in Data Preview tab, and you can see Pivot generates new column names dynamically based on unique row values for Year.
Now we will add these new columns into the metadata that can be referenced later in your data flow, Just click on Map Drifted as shown below.
Now we will add Sink, Sink transformation is added once data transformation is finished then write the data to destination store, and determines the shape and location of the data you want to write to.
Every data flow must have at least one sink transformation.
After that provide details for Sink transformation, here you need to specify dataset which refers to the Azure SQL Database linked service as our target is Azure SQL Database.
If you do not know how to create linked services for Azure SQL, again you can refer same as post – Azure data Factory Load blob Storage to SQL
After clicking on continue button, It asks you to provide name for dataset and linked service for dataset.
Next select table to load pivoted data, select Create new table, Provide Schema, and table name.
After that click on OK button
Now go back to Pipline and click on debug to see it in action.
When you debug the Pipline, you will get a notification once it is completed successfully as shown below.
Now go and Login to Azure SQL Database, and check whether the pivoted data is loaded into table or not.
Run the following SQL Statement in Query editor.
SELECT * FROM [dbo].[BlobPivotedData]
As you can see data is loaded into table and now we have only three rows which are unique Item names and all the unique rows value for Years are created as columns and values for each year columns is a sum of Item_Qty. This the is pivoting which creates multiple columns from the unique row values of a single .