Skip to content
Home » Create Filter Activity in Azure Data Factory

Create Filter Activity in Azure Data Factory

Filter activity is used in a pipeline to apply a filter expression to an input array, it filters an input data so that subsequent activities can use that filtered data.

Lets look at an example of  using Filter Activity to filter the .txt format files from input source.




As you can see, here we have one blob storage and in this blob you can see there are two .txt files and one .xlsx.

Now, we will use Filter activity to select only .txt format files from blob storage.

Log into Azure data factory portal. Click on Author tab and create a new Pipeline.

Now create a new pipeline as shown below.

Provide a name and description for pipeline. Description is an optional.

Now, Go to Activities pane and search for Get Metadata activity and drag to the pipeline canvas.

Using Get Metadata activity we will get a list of files available in Input folder.

Now, click on Settings tab, to edit its details.

Select a dataset, or create a new one with the New button.

Once, you click on New button, it asks you to select the structure of your data.

As, data is stored in blob storage so we select Azure Blob Storage.

After that, select a file format of your data, In our case it is text file.

Now, create a linked service which defines the connection to the data source.

Click on New button to create a linked service.

Now,  provide input folder path only not file name as of now.

After that, click on OK button.

So far, we have configured the dataset, now provide the details for Field list.

Click on New button.

Now, select Child items from dropdown list. Child items field list allows you to access the list of subfolders and files in the given folder.

Now, we will add a Filter activity in pipeline.




In pipeline Activities pane, search for Filter and drag Filter activity to the pipeline canvas. Connect Get Metadata activity to Filter activity.

Now, select the Items field and then select the Add dynamic content link to open the dynamic content editor pane.

Select input array to be filtered in the dynamic content editor, here we select Get Metadata1 activity output.

You can see, it adds a following expression in Pipeline expression builder.

@activity('Get Metadata1').output
Now, you need to complete the expression just add .childItems at the end of an expression.
@activity('Get Metadata1').output.childItems

After that, click on OK button.

After that, click on OK button.

Let’s validate the pipeline. Click on Validate button.

You can see, pipeline has been validated.

Now, publish all the changes in pipeline.

Let’s, execute the pipeline. Click on Debug button.

You can see, the pipeline executed successfully.

Let’s see, the Output of Filter activity.

In Grid, Just hover on filter activity item row. You will see two icons one input and another is output.




Click on Output icon, it opens an Output screen. Here you can see the Output of filter activity. To see the complete output, just maximize the screen.

Now you can see the complete output of Filter activity.

There are three files in blob storage while Filter activity has filtered out two .txt files only. You can also see the name of filtered files.

 

The output of Filter activity is correct as we have three files stored in blob storage. Out of them two is .txt files while one is .xlsx and we want to filter only .txt files.

Now, we have filter the required files using Filter Activity. You can further add subsequent activity in pipeline which can use the output of the filter activity.

Also Read..

Create Azure Data Factory using Azure Portal

Azure Data Factory ETL – Load Blob Storage to Azure SQL Table

Pivot Transformation Using Data Flow Activity

Wait Activity in Azure Data Factory

Copy multiple files from one folder to another using ForEach loop activity

Create a Schedule trigger

Delete files from folder using Delete Activity

 

Loading

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.