Skip to content
mainri

mainri

Helps you achieve transformational innovation at scale and speed!

  • Home
  • What we do
  • Why us
  • Show cases
  • Knowledge hub
    • Infrastructure and Architecture
    • Azure Data Factory and Synapse Analytics
    • Azure Databricks
    • Azure Purview
    • Lakehouse
    • Power BI
    • SQL & KQL
    • Azure Sentinel
  • Search
Posted on by William Chen

Comparing the use of wildcards in the Copy Activity of Azure Data Factory with the Get Metadata activity for managing multiple file copies

In Azure Data Factory (ADF), both the Copy Activity using wildcards (*.*) and the Get Metadata activity for retrieving a file list are designed to work with multiple files for copying or moving. However, they operate differently and are suited to different scenarios.

Copy Activity with Wildcard *.*

  • Purpose: Automatically copies multiple files from a source to a destination using wildcards.
  • Use Case: Used when you want to move, copy, or process multiple files in bulk that match a specific pattern (e.g., all .csv files or any file in a folder).
  • Wildcard Support: The wildcard characters (* for any characters, ? for a single character) help in defining a set of files to be copied. For example:
    • *.csv will copy all .csv files in the specified folder.
    • file*.json will copy all files starting with file and having a .json extension.
  • Bulk Copy: Enables copying multiple files without manually specifying each one.
  • Common Scenarios:
    • Copy all files from one folder to another, filtering based on extension or name pattern.
    • Copy files that were uploaded on a specific date, assuming the date is part of the file name.
  • Automatic File Handling: ADF will automatically copy all files matching the pattern in a single operation.

Key Benefit: Efficient for bulk file transfers with minimal configuration. You don’t need to explicitly get the file list; it uses wildcards to copy all matching files.

Example Scenario:

  • You want to copy all .csv files from a folder in Blob Storage to a Data Lake without manually listing them.

2. Get Metadata Activity (File List Retrieval)

  • Purpose: Retrieves a list of files in a folder, which you can then process individually or use for conditional logic.
  • Use Case: Used when you need to explicitly obtain the list of files in a folder to apply custom logic, processing each file separately (e.g., for-looping over them).
  • No Wildcard Support: The Get Metadata activity does not use wildcards directly. Instead, it returns all the files (or specific child items) in a folder. If filtering by name or type is required, additional logic is necessary (e.g., using expressions or filters in subsequent activities).
  • Custom Processing: After retrieving the file list, you can perform additional steps like looping over each file (with the ForEach activity) and copying or transforming them individually.
  • Common Scenarios:
    • Retrieve all files in a folder and process each one in a custom way (e.g., run different processing logic depending on the file name or type).
    • Check for specific files, log them, or conditionally process based on file properties (e.g., last modified time).
  • Flexible Logic: Since you get a list of files, you can apply advanced logic, conditions, or transformations for each file individually.

Key Benefit: Provides explicit control over how each file is processed, allowing dynamic processing and conditional handling of individual files.

Example Scenario:

  • You retrieve a list of files in a folder, loop over them, and process only files that were modified today or have a specific file name pattern.

Side-by-Side Comparison

FeatureCopy Activity (Wildcard *.*)Get Metadata Activity (File List Retrieval)
PurposeCopies multiple files matching a wildcard pattern.Retrieves a list of files from a folder for custom processing.
Wildcard SupportYes (*.*, *.csv, file?.json, etc.).No, retrieves all items from the folder (no filtering by pattern).
File SelectionAutomatically selects files based on the wildcard pattern.Retrieves the entire list of files, then requires a filter for specific file selection.
Processing StyleBulk copying based on file patterns.Custom logic or per-file processing using the ForEach activity.
Use CaseSimple and fast copying of multiple files matching a pattern.Used when you need more control over each file (e.g., looping, conditional processing).
File Count HandlingAutomatically processes all matching files in one step.Returns a list of all files in the folder, and each file can be processed individually.
EfficiencyEfficient for bulk file transfer, handles all matching files in one operation.More complex as it requires looping through files for individual actions.
Post-Processing LogicNo looping required; processes files in bulk.Requires a ForEach activity to iterate over the file list for individual processing.
Common Scenarios– Copy all files with a .csv extension.
– Move files with a specific prefix or suffix.
– Retrieve all files and apply custom logic for each one.
– Check file properties (e.g., last modified date).
Control Over Individual FilesLimited, bulk operation for all files matching the pattern.Full control over each file, allowing dynamic actions (e.g., conditional processing, transformations).
File Properties AccessNo access to specific file properties during the copy operation.Access to file properties like size, last modified date, etc., through metadata retrieval.
Execution TimeFast for copying large sets of files matching a pattern.Slower due to the need to process each file individually in a loop.
Use of Additional ActivitiesOften works independently without the need for further processing steps.Typically used with ForEach, If Condition, or other control activities for custom logic.
Scenarios to Use– Copying all files in a folder that match a certain extension (e.g., *.json).
– Moving large numbers of files with minimal configuration.
– When you need to check file properties before processing.
– For dynamic file processing (e.g., applying transformations based on file name or type).

When to Use Each:

  • Copy Activity with Wildcard:
    • Use when you want to copy multiple files in bulk and don’t need to handle each file separately.
    • Best for fast, simple file transfers based on file patterns.
  • Get Metadata Activity with File List:
    • Use when you need explicit control over each file or want to process files individually (e.g., with conditional logic).
    • Ideal when you need to loop through files, check properties, or conditionally process files.

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

CategoriesAzure, Azure Data Factory, Azure Synapse, Data Engineering, Unity Catalog TagsADF, Copy Activity, Get Metadata activity, multiple file, Synapse, wildcard

Post navigation

Previous PostPrevious Data Lake implementation – Data Lake Zones and Containers Planning
Next PostNext A few Important Terminology of Databricks

Search

  • Infrastructure and Architecture
    • What is Service Principal ID, Application ID, Client ID, Tenant ID
    • Comprehensive migration engineering strategy
    • Create Service Principle, Register an application on Azure Entra ID (former Active Directory)
    • Using Key Vault services in Azure Ecosystem
    • Configuring Azure Entra ID Authentication in Azure SQL Database
    • Introduce Medallion Architecture
    • Data Lake implementation – Data Lake Zones and Containers Planning
    • Create External Data Sources in Synapse Serverless SQL
    • Data lake vs delta lake vs data lakehouse, and data warehouses comparison
    • SharePoint Online grants ADF or ASA access to extract data
    • Unity Catalog: Create Metastore and Enabling Unity Catalog in Azure
  • Azure Data Factory and Synapse Analytics
    • Create External Data Sources in Synapse Serverless SQL
    • ADF activities failure vs pipeline failure and pipeline error handling logical mechanism
    • Comparing the use of wildcards in the Copy Activity of Azure Data Factory with the Get Metadata activity for managing multiple file copies
    • Azure Data Factory or Synapse Analytic Lookup Activity Filter Modified date query for SharePoint Online List
    • Metadata driven full solution to incrementally copy data from SharePoint Online sink to ADSL Gen2 by using Azure Data Factory or Synapse
    • Azure Data Factory or Synapse lookup Delta table in Databricks Unity Catalog
    • Comparative Analysis of Linked Services in Azure Data Factory and Azure Synapse Analytics
    • Azure Data Factory or Synapse Copy Activity with File System
    • Get Metadata activity in ADF or ASA
    • Building Slowly Changing Dimensions Type 2 in Azure Data Factory and Synapse
    • Dynamic ETL Mapping in Azure Data Factory/Synapse Analytics: Source-to-Target Case Study Implementation (1)
    • Change Data Capture with Azure Data Factory and Synapse Analytics
    • Using Exists Transformation for Data Comparison in Azure Data Factory/Synapse
    • Data Flow: Aggregate Transformation
  • Azure Databricks
    • A few Important Terminology of Databricks
    • Unity Catalog
      • Unity Catalog in Databricks
      • Comparison of Unity Catalog, External Data Source, External Table, Mounting Data and Metastore
      • Unity Catalog: Create Metastore and Enabling Unity Catalog in Azure
      • Unity Catalog: Create Storage Credentials and External Locations
      • Comparison of the Hive Metastore, Unity Catalog Metastore, and a general Metastore
      • Unity Catalog: Catalogs and Schemas
      • Unity Catalog – Table Type Comparison
      • Unity Catalog: Creating Tables
      • Unity Catalog: Data Access Control with Databricks Unity Catalog
    • Read table from Unity Catalog and write table to Unity Catalog
    • DBFS
      • DBFS: Databricks File System (DBFS)
      • DBFS: Access database read/write database using JDBC
      • DBFS: Access ADLS or Blob using Service Principle with Oauth2
    • dbutils
      • dbutils: Databricks File System, dbutils
      • dbutils: Secrets and Secret Scopes
      • dbutils: mount, using Account Key or SAS to access adls or blob
      • dbutils: widgets
      • dbutls: notebook run(), exit() and pass parameters
    • ADB
      • Comparison between All-Purpose Cluster, Job Cluster, SQL Warehouse and Instance Pools
      • Partition in databricks
      • Comparison Partitioning Strategies and Methods
    • delta
      • Delta Table, Delta Lake
      • Delta: Time Travel of Delta Table
      • delta: Schema Evolution
    • deltaTable vs DataFrames
    • Implementing Slowly Changing Dimension Type 2 Using Delta Lake on Databricks
    • Overview of Commonly Used Unity Catalog and Spark SQL Management Commands
    • Read a delta table from Blob/ADLS and write a delta table to Blob/ADLS
    • Spark
      • spark: RDD, Dataframe, Dataset, Transformation and Action
    • Add a new user to workspace
    • PySpark
      • Summary of Dataframe Methods
      • Pyspark: read and write a csv file
      • Pyspark: read, write and flattening complex nested json
      • Pyspark: read and write a parquet file
      • DBFS: Access database read/write database using JDBC
      • withColumn, select
      • StructType(), StructField()
      • arrayType, mapType column and functions
      • from_json(), to_json()
      • condition: when (), otherwise (), expr()
      • withColumnRenamed(), drop(), show()
      • alias(), asc(), desc(), cast(), filter(), where(), like() functions
      • distinct(), dropDuplicates(), orderBy(), sort(), groupBy(), agg()
      • Join(), union(), unionAll(), unionByName(), fill(), fillna()
      • contains(), collect(), transform(), udf(), udf for sql
      • Comparison of transform() and udf() in PySpark
    • PySpark Data sources
    • PySpark DataFrame
    • PySpark Built-in Functions
  • Azure Purview
    • Azure Purview Introduction
    • Day 2: Quick start, what is inside
    • Day 3: How Microsoft Purview works – Data Source, Rule Sets, and Classification
    • Day 4: Registering ADLS Gen2 and Scan in Purview
    • Day 5: Registering Azure SQL Database and Scan in Purview
    • Day 6: Registering Azure Synapse Analytics workspaces and scan in Microsoft Purview
    • Day 7: Permission and Roles, Business Glossary and Collections Access Control in Purview
    • Day 8 – Data Lineage, Extract SQL, ADF, Synapse Pipeline Lineage
    • Day 9: Managed attributes in Data Map
    • Day 10: Workflows in Azure Purview
  • Lakehouse
    • Data Lake implementation – Data Lake Zones and Containers Planning
    • Data lake vs delta lake vs data lakehouse, and data warehouses comparison
    • Unity Catalog: Create Metastore and Enabling Unity Catalog in Azure
    • Comparing the use of wildcards in the Copy Activity of Azure Data Factory with the Get Metadata activity for managing multiple file copies
    • Unity Catalog in Databricks
    • Azure Data Factory or Synapse lookup Delta table in Databricks Unity Catalog
    • Delta Table, Delta Lake
    • Delta: Time Travel of Delta Table
    • delta: Schema Evolution
    • Comprehensive migration engineering strategy
  • SQL & KQL
    • SQL
      • Comparison of Azure SQL Managed Instance, Azure SQL Database, Azure SQL Server
      • Configuring Azure Entra ID Authentication in Azure SQL Database
      • Create External Data Sources in Synapse Serverless SQL
      • Create External Data Sources in Synapse Serverless SQL
      • Using SQL Server Change Data Capture (CDC) in pipeline to implement incrementally UPSERT
      • Using sp_MSforeachdb to Search for Objects Across All Databases
      • Summary of Commonly used T-SQL queries
      • Summary of SQL built-in functions
      • Locking Mechanisms in Relational Database Management Systems (RDBMS)
    • KQL
      • Kusto Query Language (KQL) – quick reference
      • KQL query map SQL query
  • Power BI
  • Azure Sentinel
    • Kusto Query Language (KQL) – quick reference
    • KQL query map SQL query

At Mainri, we provide Consulting as a Service (CaaS) that helps you manage your growth while keeping your team focused on their core activities.

70 Forest Manor Road
Toronto Ontario Canada M2J 0A9
Tel. 437-500-8955
Email: info@mainri.ca

Search

Proudly powered by WordPress