Implementing Azure data Factory ETL – Load blob Storage to Azure SQL Database
According to Microsfot, Azure Data Factory is Azure’s cloud ETL service for scale-out serverless data integration and data transformation. It offers a code-free UI for intuitive authoring and single-pane-of-glass monitoring and management.
Here we will see how to Load a Blob Storage to Azue SQL database table using Azure Data Factory.
As you can see in below screenshot, Here we have already a blob file EmployeeInfo.txt uploaded in Azure Blob Storage container.
If you do not know, how to upload a file to blob storage, you can refer post – Upload File to Azure Storage Container.
CREATE TABLE dbo.EmployeeInfo( EmpId INT NOT NULL PRIMARY KEY, EmpName VARCHAR (50) NOT NULL, EmpCode VARCHAR(10) NOT NULL, Department VARCHAR(50) )
Once you click on Run, it creates a new table is database, you can verify same by clicking on Tables folder, and there you can table EmployeeInfo.
Now go to Azure Data factory Resource, If you do not have created Azure Data Factory, can refer post – Create An Azure Data Factory.
Here, we have already an Azure Data Factory named as oneazuredatafactory, In azure data factory page click on Author & Monitor tab to launch an Azure Data Factory Portal as shown in below screenshot.
After clicking, It navigates you to Azure Data Factory Portal(https://adf.azure.com/).
Now click on Author button, select Pipelines, then click on New PipeLine.
A pipeline is a logical grouping of activities that together perform a task. The activities in a pipeline define actions to perform on your data.
After clicking on New Pipeline, give it a name as Blob Storage To SQL.
Now we will create a linked service, Linked Service basically is used to link your data store to the data factory.
Linked services are like as connection strings, which define the connection information needed for Data Factory to connect to external resources.
To copy data from Blob storage to a Azure SQL Database, we need to create two linked services: Azure Storage and Azure SQL Database.
Then, create two datasets: Azure Blob dataset which refers to the Azure Storage linked service and Azure SQL Table dataset which refers to the Azure SQL Database linked service.
Lets create a linked service, Go to Manage tab then click on +new button.
once you click on Button, a New Linked Service Page opens there search for Azure blob in search bar.
Next select Azure Blob Storage and click on Continue button.
After that provide a details for Linked service for azure blob storage as shown below.
Specify a name for linked service, Azure subscription plan, here we have selected Authentication method as Account key just for demo purpose, but the most secure Authentication Method is Managed Identity.
At last you can verify connection and hit on create button.
Now, time to create a Dataset. Click on Datasets , then click on New dataset .
After clicking New data Sets, Search for Azure blob storage and click on Continue.
Then select a format for blob Storage, as we have a text file so we have selected format as CSV Delimitedtext. Next click on Continue.
Once you click on Continue, A Set Properties page opens. now provide name for dataset, then select linkedservice, select blob storage file path as shown below.
Once you select the file, then select first row as header (If your file having a first row as headers) and click on Ok Button.
Once you click on OK button, Dataset is created.
Now you can see a connection settings, there you can see linkedsevice that you provided for this, can set file delimiter for rows and columns also.
You can also see file preview and verify the connection clicking on test connection.
Now we need to connect Azure SQL, for this we need to create another linked service.
So, click on Manage Tab, a Linked Services Page opens, there click on +New button.
After clicking, a New Linked Service page opens, search for azure sql in search bar.
Then select Azure SQL Database and click on Continue button.
Once you click on continue button, a New linked Service(Azure SQL Database) Page opens.
Provide required details, select Authentication Type, here we have selected as SQL Authentication but the most secure is Service principle or Managed Identity.
At last verify test connection and click on Create button as shown below.
Once you click on Create button, Linked service is created successfully, same you can see on linked services page.
Now we have two linked service one for Azure Blob Storage and second for Azure SQL Database.
Now we will create another Dataset – Azure SQL Database, named as AzureSqltable.
Go to Datasets, Click on New Dataset, a New dataset page opens.
Now search for Azure SQL in search bar, select Azure SQL Database form list. Next click on Continue.
After clicking, a set Properties page opens, Provide a required details for Dataset, and click on OK button.
Once you click on Ok button, a Dataset is created as shown in below screenshot.
Now we will create an Activities to perform action on Data. Go back to pipelines, and in Activities section, click on copy data activity, and drag and drop it into screen.
Named it as copy Data blob to SQL as shown in General tab.
Now click on Source tab, select souse file that is dataset – EmployeeInfo_txt.
Now go to Sink tab, select Data set –AzureSqltable.
Now, just click on Debug button to check if everything is working fine or not.
And you can see a Notification message on screen as succeeded, that means activity running fine.
Lets verify the same on SQL side, whether the data is copied to SQL table or not.
Go back to Azure Portal, then login to SQL Database.
Lets select the records from table employeeInfo, and you can see data from blob storage to SQL table are loaded successfully.
As you have verified that activity is working fine then you can also publish this.
Just go back to azure data factory portal and hit on Publish button.
Once it is published successfully, you will get a notification as shown below.