Using Azure Data Factory (ADF) to look up data from a Delta table in Databricks’ Unity Catalog involves several steps. Here’s a high-level overview of how you can accomplish this:
Step 1: Set Up Your Environment
- Azure Databricks Workspace: Ensure you have an Azure Databricks workspace set up and that your Delta table is registered in the Unity Catalog.
- Azure Data Factory Instance: Create or use an existing Azure Data Factory instance.
Step 2: Create an Azure Databricks Delta Lake Linked Service
Management > Linked Services > Azure Databricks Delta Lake
There are 3 sort of Authentication method. Choose either of them.
- Access token
- System- assigned managed Identity
- User-assigned managed identity
There are some required information have to fill in .
Fill in information
subscription, Databricks workspace, workspace resource ID, Cluster ID etc. Depend on what you chosen
subscription ID, Databricks workspace name, Databricks Workspace URL, Databricks resource ID, those information can be find out from
Azure Portal > Databricks > overview
Workspace resource ID Patten
/subscriptions/<your Subscription ID>/resourceGroups/<Your Managed Resource Group>/providers/Microsoft.Databricks/workspaces/<your Databricks service name>
To find out “Existing cluster ID”
databricks workspace > compute >
find out which cluster/computer you are using. e.g.
if you are using other type clusters, such as all-purpose cluster, you can
open your databricks workspace >
look at the workspace URL, the URL paten looks this
https://<your ADB workspace>.azuredatabricks.net/compute/clusters/1003-175247-ewmllvcs?o=...........
after ../clusters/ <here is your cluster ID> . (highlight)
Step 3: Create a Dataset for the Delta Table
We have:
- Unity catalog called “dev”;
- Schema (Database) called test;
- Table name is testing
Using above created Databricks delta lake linkedservices –
Step 4: Use the Lookup Activity
In Azure Data Factory or Synapse, choose “Lookup” activity, using the azure databricks delta lake dataset above created.
we can use “table” or “query“.
if we use “query”, we are able to dynamically generate the query script. for example,
query: SELECT * FROM my_database.my_table WHERE condition = ‘value’
Dynamic content
@concat('select * from '
, pipeline().parameters.Environment
, '.twedi.sharepointlog'
)
Step 5: Use the Lookup Output
You can reference the output of the Lookup activity in subsequent activities within the pipeline. For example, you might want to use it in a conditional activity or to pass parameters to another activity.
Conclusion
Following these steps will allow you to successfully use Azure Data Factory’s Lookup activity to query a Delta table from Databricks Unity Catalog. If you run into specific issues or have further questions, feel free to ask!