Azure Data Factory or Synapse lookup Delta table in Databricks Unity Catalog

Using Azure Data Factory (ADF) to look up data from a Delta table in Databricks’ Unity Catalog involves several steps. Here’s a high-level overview of how you can accomplish this:

Step 1: Set Up Your Environment

  1. Azure Databricks Workspace: Ensure you have an Azure Databricks workspace set up and that your Delta table is registered in the Unity Catalog.
  2. Azure Data Factory Instance: Create or use an existing Azure Data Factory instance.

Step 2: Create an Azure Databricks Delta Lake Linked Service

Management > Linked Services > Azure Databricks Delta Lake

There are 3 sort of Authentication method. Choose either of them.

  • Access token
  • System- assigned managed Identity
  • User-assigned managed identity

There are some required information have to fill in .

Fill in information

subscription, Databricks workspace, workspace resource ID, Cluster ID etc. Depend on what you chosen

subscription ID, Databricks workspace name, Databricks Workspace URL, Databricks resource ID, those information can be find out from

Azure Portal > Databricks > overview

Workspace resource ID Patten

/subscriptions/<your Subscription ID>/resourceGroups/<Your Managed Resource Group>/providers/Microsoft.Databricks/workspaces/<your Databricks service name>

To find out “Existing cluster ID”

databricks workspace > compute >
find out which cluster/computer you are using. e.g.

if you are using other type clusters, such as all-purpose cluster, you can

open your databricks workspace >

look at the workspace URL, the URL paten looks this

https://<your ADB workspace>.azuredatabricks.net/compute/clusters/1003-175247-ewmllvcs?o=...........

after ../clusters/ <here is your cluster ID> . (highlight)

Step 3: Create a Dataset for the Delta Table

We have:

  • Unity catalog called “dev”;
  • Schema (Database) called test;
  • Table name is testing

Using above created Databricks delta lake linkedservices –

Step 4: Use the Lookup Activity

In Azure Data Factory or Synapse, choose “Lookup” activity, using the azure databricks delta lake dataset above created.

we can use “table” or “query“.

if we use “query”, we are able to dynamically generate the query script. for example,

query: SELECT * FROM my_database.my_table WHERE condition = ‘value’

Dynamic content

@concat('select * from '
, pipeline().parameters.Environment
, '.twedi.sharepointlog'
)

Step 5: Use the Lookup Output

You can reference the output of the Lookup activity in subsequent activities within the pipeline. For example, you might want to use it in a conditional activity or to pass parameters to another activity.

Conclusion

Following these steps will allow you to successfully use Azure Data Factory’s Lookup activity to query a Delta table from Databricks Unity Catalog. If you run into specific issues or have further questions, feel free to ask!