Azure Data Factory and Synapse Analytics


Create External Data Sources in Synapse Serverless SQL
Step by step using Synapse Studio UI or Script to create External Data Source more …
ADF or ASA lookup filter Modified date query for SharePoint Online List
This article is focused on ADF or ASA lookup activity filter modified date, type, is Current version or not etc. query for SharePoint Online List. more …
Metadata driven full solution for Azure Data Factory or Synapse incrementally copy data from SharePoint Online sink to ADSL Gen2
Provide a fully Metadata driven solution about using Azure Data Factory (ADF) or Synapse Analytics (ASA) incrementally copy multiple data sets one time from SharePoint Online (SPO) then sink them to ADSL Gen2. more …
ADF activities failure vs pipeline failure and pipeline error handling logical mechanism
This article talks when activities failed on a pipeline, the pipeline status will be, failure or success, and error handing logical mechanism. more ...
Comparing the use of wildcards in the Copy Activity of Azure Data Factory with the Get Metadata activity for managing multiple file copies
Both the Copy Activity with wildcards (*.*
) and the Get Metadata activity for retrieving a file list serve to work with multiple files copy or move. more …
ADF or ASA lookup Delta Table in Databricks Unity Catalog
Using ADF to look up data from a Delta table in Databricks’ Unity Catalog involves several steps more …
Comparative Analysis of Linked Services in Azure Data Factory and Azure Synapse Analytics
In ADF or AASA, when you create Linked Services, some of them are slimier. But have slightly different purposes and have some key differences. e.g.
“REST Linked Services” & “HTTP Linked Services”
“Linked Services to databricks” & “Linked Services to databricks delta lake
Azure Data Factory or Synapse Copy Activity with File System
In Azure Data Factory (ADF) or Synapse, using Copy Activity with a File System as a source or sink is common when dealing with on-premises more …
Get Metadata activity in ADF or ASA
The Get Metadata activity in Azure Data Factory (ADF) or Azure Synapse Analytics (ASA) is used to retrieve metadata about a file, folder, or database. This activity is particularly useful when you need to dynamically determine properties like file name, size, structure, or existence and use them in subsequent pipeline activities.
more …
Building Slowly Changing Dimensions Type 2 in Azure Data Factory and Synapse
Within the context of enterprise data warehousing, the effective management of historical data is essential for supporting informed business decision-making. Slowly Changing Dimension (SCD) Type 2 is a widely adopted technique for addressing changes in data over time.
more …
Dynamic ETL Mapping in Azure Data Factory/Synapse Analytics: Source-to-Target Case Study Implementation (1)
Scenario: the source schema may have new or missing columns, or the destination schema may have columns with different names or might lack columns to accommodate new incoming source fields.
- Source: CSV, Schema varies between executions (columns may be missing, reordered, or new).
- Destination: Database, SQL DB,
- Requirement: Dynamically handle variations in the source data, map data to a consistent destination schema, and gracefully manage missing columns by assigning default values to them. Add new columns to the target database table if they appear in the incoming source data.
- Problem: emp_ID and dep_id missed from source data;
schema name are not exactly same, name <–> emp_Name;
Target DB table does not have the column “state”
Change Data Capture with Azure Data Factory and Synapse Analytics
ADF new top-level function Chang Data Capture (CDC) is the most effective way to read the source data that has changed since the last time the pipeline ran, rather than always querying an entire dataset on each run.
Data Flow
Azure Data Factory (ADF) Mapping Data Flow offers a variety of transformations to design and implement complex data transformation workflows.
Visual Summary of Categories
Category | Transformations |
---|---|
Schema Modifiers | Select, Alter Row, Derived Column, Pivot, Unpivot, Surrogate Key |
Row Modifiers | Filter, Conditional Split, Sort, Window |
Joins | Join, Lookup |
Aggregations | Aggregate |
Data Quality | Assert, Exists |
Set Operations | Union, Intersect, Except |
Flow Controls | Cache, Alter Row |
Source and Sink | Source, Sink |
Data Processing | Flatten, Cross Join |
Aggregate Transformation
The Aggregate transformation in Azure Data Factory (ADF) Data Flows is a powerful tool for performing calculations on groups of data. It’s analogous to the GROUP BY
clause in SQL, allowing you to summarize data based on one or more grouping columns.
Alter Row Transformation
The Alter Row Transformation is used to set row-level policies for data being written to a sink. This transformation is particularly useful when you are working with slowly changing dimensions (SCD) or when you need to synchronize data between source and sink systems.
Using Exists Transformation for Data Comparison in Azure Data Factory/Synapse
The Exists transformation is a row filtering transformation that checks whether your data exists in another source or stream. The output stream includes all rows in the left stream that either exist or don’t exist in the right stream.