System – mainri

dbutils: widgets

In Databricks notebooks, dbutils.widgets provide a way to create interactive controls like dropdowns, text inputs, and multi-selects. These widgets make your notebooks more interactive by allowing users to input parameters that can drive the notebook’s behavior without editing the code itself.

Types of Widgets

Text Box (dbutils.widgets.text): Allows users to input free-form text.
Dropdown (dbutils.widgets.dropdown): Presents a dropdown menu with predefined options.
Combobox (dbutils.widgets.combobox): A combination of a text box and a dropdown, allowing users to either select from a list or enter a new value.
Multi-Select (dbutils.widgets.multiselect): Allows users to select multiple options from a dropdown list.

Common dbutils.widgets Commands

Create a Text Box

dbutils.widgets.text(“input_text”, “default_value”, “Text Input”)

#SQL

CREATE WIDGET TEXT tableName DEFAULT ‘customers’

“input_text”: The name of the widget (used to retrieve the value).
“default_value”: Default value shown when the widget is created.
“Text Input”: Label shown next to the widget in the notebook UI.

Create a Dropdown

dbutils.widgets.dropdown(“dropdown”, “option1”, [“option1”, “option2”, “option3”], “Dropdown Label”)

#SQL

CREATE WIDGET DROPDOWN country DEFAULT ‘USA’ CHOICES [‘USA’, ‘UK’, ‘India’]

"dropdown": The name of the widget.
"option1": Default selected option.
["option1", "option2", "option3"]: List of options.
"Dropdown Label": Label for the dropdown.

Create a Combobox

dbutils.widgets.combobox(“combobox”, “option1”, [“option1”, “option2”, “option3”], “Combobox Label”)

Create a Multi-Select

dbutils.widgets.multiselect(“multi_select”, “option1”, [“option1”, “option2”, “option3”], “Multi-Select Label”)

#SQL

CREATE WIDGET MULTISELECT status DEFAULT ‘active’ CHOICES [‘active’, ‘inactive’, ‘pending’]

Retrieving Widget Values

value = dbutils.widgets.get(“widget_name”)
print(f”Selected value: {value}”)

#SQL

SELECT * FROM ${getArgument(‘tableName’)}
WHERE country = ‘${getArgument(‘country’)}’
AND status IN (${getArgument(‘status’)})

Cation, In sql, use getArgument.

Removing Widgets

Remove a Single Widget

dbutils.widgets.remove(“widget_name”)

#SQL

— Remove widgets when no longer needed
REMOVE WIDGET widget_name

Remove All Widgets

dbutils.widgets.removeAll()

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

DBFS: Access ADLS or Blob using Service Principle with Oauth2

There are several ways to connect Azure Data Lake Storage (ADLS) Gen2 or Blob to Databricks. Each method offers different levels of security, flexibility, and complexity. Such as

Mount ADLS to DBFS
Service Principal with OAuth 2.0
Direct access with ABFS URI (Azure Blob File System)
Azure Active Directory (AAD) Passthrough
SAS Token (Shared Access Signature)
Managed Identity (MSI)
Access Key Authentication

We have discussed “mount”, Account Key, SAS token at dbutils: mount, using Account Key or SAS to access adls or blob

Today we focus on using Service principal with OAuth2.0 to access ADLS or Blob.

Access ADLS Gen2 using Service Principal with OAuth 2.0

To access Azure Data Lake Storage (ADLS) or Blob Storage using a service principal with OAuth2 in Azure Databricks, follow these steps. This approach involves authenticating using a service principal’s credentials, such as the client ID, tenant ID, and client secret.

Previously, we discussed how to create Service Principle, Register an application on Azure Entra ID (former Active Directory), generate client-secret-value. To get client.id and client-secret-value please review this article.

if we have save the client-secret-value at databricks’ “secret-scopes”, please review the article “dbutils: Secrets and Secret Scopes“.

Define service principal credentials


# Define service principal credentials
client_id = "<your-client-id>"
tenant_id = "<your-tenant-id>"

# It's recommended to store the secret securely in a secret scope
client_secret = dbutils.secrets.get(scope = "<scope-name>", key = "<client-secret-key>")

OAuth2 configuration for service principal


# OAuth2 configuration for service principal
configs = {
  "fs.azure.account.auth.type": "OAuth",
  "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id": "<client-id>",
  "fs.azure.account.oauth2.client.secret": "<client-secret>",
  "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant-id>/oauth2/token"
}

Mount ADLS to DBFS


#Mount ADLS to DBFS
dbutils.fs.mount(
  source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
  mount_point = "/mnt/adls",
  extra_configs = configs
)

full example here.

# define variable
TenantID="b4e0dd6XXXXXXXXXXXXXXXX99ec4b8e"
clientID="72baeXXXXXXXXXXXXXX659ea547"
Secret-scope-name="my_scopes"
secrets-key = "my_secret_key"
storage_account_name = "asamainriadls"
container = "mainri-asa-file-system"
my_mount_point = "/mnt/mainriadlsfilesystem"

#get secret
my_secret = dbutils.secrets.get(scope=Secret-scope-name, key=secrets-key)

# OAuth2 configuration for service principal
configs = {
"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": f"{ClientID}",
"fs.azure.account.oauth2.client.secret": f"{my_secret}",
"fs.azure.account.oauth2.client.endpoint": f”https://login.microsoftonline.com/{TenantID}/oauth2/token”
}

In fact, we are able to directly access to ADLS/blob now.

# in fact,  we are able to directly access adls/blob now.
#adls
df = spark.read.csv(f"abfs://{container}@{storage_account_name }.dfs.windows.core.net/input/test.csv")

display(df)

# blob
display(spark.read.csv(f"abfs://{container}@{storage_account_name }.dfs.windows.core.net/input/test.csv"))

Of course, we are able to mount the ADLS to DBFS if we like; but it’s not necessary at this moment for demo.


# mount to DBFS
dbutils.fs.mount(
  source = f"abfss://{container} @{storage_account_name}.dfs.core.windows.net/",
  mount_point = my_mount_point,
  extra_configs = configs
)

#Mount Blob to DBFS
dbutils.fs.mount(
  source = "abfss://<container-name>@<storage-account-name>.blob.core.windows.net/",
  mount_point = "/mnt/adls",
  extra_configs = configs
)

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

Comparison: “Linked Services to databricks” and “Linked Services to databricks delta lake”

In Azure Data Factory (ADF) or Azure Synapse Analytics, when you create Linked Services, both “Databricks” and “Databricks Delta Lake” are available as options. Here’s the key difference:

Key Differences:

Databricks Linked Service is for connecting to the compute environment (jobs, notebooks) of Databricks.
Databricks Delta Lake Linked Service is for connecting directly to Delta Lake data storage (tables/files).

Here’s a side-by-side comparison between Databricks and Databricks Delta Lake Linked Services in Azure Data Factory (ADF):

Feature	Databricks Linked Service	Databricks Delta Lake Linked Service
Purpose	Connect to an Azure Databricks workspace to run jobs or notebooks.	Connect to Delta Lake tables within Azure Databricks.
Primary Use Case	Run notebooks, Python/Scala/Spark scripts, and perform data processing tasks on Databricks.	Read/write data from/to Delta Lake tables for data ingestion or extraction.
Connection Type	Connects to the compute environment of Databricks (notebooks, clusters, jobs).	Connects to data stored in Delta Lake format (structured data files).
Data Storage	Not focused on specific data formats; used for executing Databricks jobs.	Specifically used for interacting with Delta Lake tables (backed by Parquet files).
ACID Transactions	Does not inherently support ACID transactions (although Databricks jobs can handle them in notebooks).	Delta Lake supports ACID transactions (insert, update, delete) natively.
Common Activities	– Running Databricks notebooks. – Submitting Spark jobs. – Data transformation using PySpark, Scala, etc.	– Reading from or writing to Delta Lake. – Ingesting or querying large datasets with Delta Lake’s ACID support.
Input/Output	Input/output via Databricks notebooks, clusters, or jobs.	Input/output via Delta Lake tables/files (with versioning and schema enforcement).
Data Processing	Focus on data processing (ETL/ELT) using Databricks compute power.	Focus on data management within Delta Lake storage layer, including handling updates and deletes.
When to Use	– When you need to orchestrate and run Databricks jobs for data processing.	– When you need to read or write data specifically stored in Delta Lake. – When managing big data with ACID properties.
Integration in ADF Pipelines	Execute Databricks notebook activities or custom scripts in ADF pipelines.	Access Delta Lake as a data source/destination in ADF pipelines.
Supported Formats	Any format depending on the jobs or scripts running in Databricks.	Primarily deals with Delta Lake format (which is based on Parquet).

dbutils: Databricks File System, dbutils

Databricks File System (DBFS) is a distributed file system mounted into a Databricks workspace and available on Databricks clusters. DBFS is an abstraction on top of scalable object storage.

Databricks recommends that you store data in mounted object storage rather than in the DBFS root. The DBFS root is not intended for production customer data.

DBFS root is the default file system location provisioned for a Databricks workspace when the workspace is created. It resides in the cloud storage account associated with the Databricks workspace

Databricks dbutils

**dbutils** is a set of utility functions provided by Databricks to help manage and interact with various resources in a Databricks environment, such as files, jobs, widgets, secrets, and notebooks. It is commonly used in Databricks notebooks to perform tasks like handling file systems, retrieving secrets, running notebooks, and controlling job execution.

Dbutils.help()

credentials: DatabricksCredentialUtils -> Utilities for interacting with credentials within notebooks
data: DataUtils -> Utilities for understanding and interacting with datasets (EXPERIMENTAL)
fs: DbfsUtils -> Manipulates the Databricks filesystem (DBFS) from the console
jobs: JobsUtils -> Utilities for leveraging jobs features
library: LibraryUtils -> Utilities for session isolated libraries
meta: MetaUtils -> Methods to hook into the compiler (EXPERIMENTAL)
notebook: NotebookUtils -> Utilities for the control flow of a notebook (EXPERIMENTAL)
preview: Preview -> Utilities under preview category
secrets: SecretUtils -> Provides utilities for leveraging secrets within notebooks
widgets: WidgetsUtils -> Methods to create and get bound value of input widgets inside notebooks

1. dbutils.fs (File System Utilities)

dbutils.fs.help()

dbutils.fs provides utilities to interact with various file systems, like DBFS (Databricks File System), Azure Blob Storage, and others, similarly to how you would interact with a local file system.

List Files:

dbutils.fs.ls(“/mnt/”)

Mount Azure Blob Storage:


dbutils.fs.mount(
    source = "wasbs://<container>@<storage-account>.blob.core.windows.net",
    mount_point = "/mnt/myblobstorage",
    extra_configs = {"<key>": "<value>"}
)

For Azure Blob: wasbs://
For Azure Data Lake Gen2: abfss://
For S3: s3a://

Unmount

dbutils.fs.unmount("/mnt/myblobstorage")

Copy Files:

dbutils.fs.cp("/mnt/source_file.txt", "/mnt/destination_file.txt")

Remove Files:

dbutils.fs.rm("/mnt/myfolder/", True)  # True to remove recursively

Move Files:

dbutils.fs.mv("/mnt/source_file.txt", "/mnt/destination_file.txt")

dbutils.secrets (Secret Management)

dbutils.secrets is used to retrieve secrets stored in Databricks Secret Scopes. This is essential for securely managing sensitive data like passwords, API keys, or tokens.

dbutils.secrets.help()

Get a Secret:

my_secret = dbutils.secrets.get(scope="my-secret-scope", key="my-secret-key")

List Secrets:

dbutils.secrets.list(scope="my-secret-scope")

List Secret Scopes:

dbutils.secrets.listScopes()

dbutils.widgets (Parameter Widgets)

dbutils.notebook provides functionality to run one notebook from another and pass data between notebooks. It’s useful when you want to build modular pipelines by chaining multiple notebooks.

dbutils.widgets.help()

Run Another Notebook:

dbutils.notebook.run("/path/to/other_notebook", 60, {"param1": "value1", "param2": "value2"})

Runs another notebook with specified timeout (in seconds) and parameters. You can pass parameters as a dictionary.

Exit a Notebook:

dbutils.notebook.exit("Success")

Exits the notebook with a status message or value.

Return Value from a Notebook:

result = dbutils.notebook.run("/path/to/notebook", 60, {"param": "value"})
print(result)

dbutils.jobs (Job Utilities)

dbutils.jobs helps with tasks related to job execution within Databricks, such as getting details about the current job or task.

dbutils.jobs.help()

Get Job Run Information

job_info = dbutils.jobs.taskValues.get(job_id="<job_id>", task_key="<task_key>")

dbutils.library

Manages libraries within Databricks, like installing and updating them (for clusters).

dbutils.library.installPyPI("numpy")

Example

# Mount Azure Blob Storage using dbutils.fs
dbutils.fs.mount(
    source = "wasbs://mycontainer@myaccount.blob.core.windows.net",
    mount_point = "/mnt/mydata",
    extra_configs = {"fs.azure.account.key.myaccount.blob.core.windows.net": "<storage-key>"}
)

# List contents of the mount
display(dbutils.fs.ls("/mnt/mydata"))

# Get a secret from a secret scope
db_password = dbutils.secrets.get(scope="my-secret-scope", key="db-password")

# Create a dropdown widget to choose a dataset
dbutils.widgets.dropdown("dataset", "dataset1", ["dataset1", "dataset2", "dataset3"], "Choose Dataset")

# Get the selected dataset value
selected_dataset = dbutils.widgets.get("dataset")
print(f"Selected dataset: {selected_dataset}")

# Remove all widgets after use
dbutils.widgets.removeAll()

# Run another notebook and pass parameters
result = dbutils.notebook.run("/path/to/notebook", 60, {"input_param": "value"})
print(result)

Magic Command

list

Aspect	%fs (Magic Command)	%sh (Magic Command)	dbutils.fs (Databricks Utilities)	os.<> (Python OS Module)
Example Usage	%fs ls /databricks-datasets	%sh ls /tmp	dbutils.fs.ls(“/databricks-datasets”)	import os os.listdir(“/tmp”)
Cloud Storage Mounts	Can access mounted cloud storage paths.	No, unless the cloud storage is accessible from the driver node.	Can mount and access external cloud storage (e.g., S3, Azure Blob) to DBFS.	No access to mounted DBFS or cloud storage.
Use Case	Lightweight access to DBFS for listing, copying, removing files.	Execute system-level commands from notebooks.	Programmatic, flexible access to DBFS and cloud storage.	Access files and environment variables on the local node.

Unity Catalog: Data Access Control with Databricks Unity Catalog

This article explains how to control access to data and other objects in Unity Catalog.

Principals

Entities that can be granted permissions (e.g., users, groups, or roles).

Example: A user like alice@company.com or a group like DataEngineers can be considered principals.

Privileges

The specific rights or actions that a principal can perform on a securable object.

SELECT: Read data from a table or view.
INSERT: Add data to a table.
UPDATE: Modify existing data.
DELETE: Remove data.
ALL PRIVILEGES: Grants all possible actions.

Example: GRANT SELECT ON TABLE transactions TO DataScientists;

Securable Objects

The resources or entities (e.g., databases, tables, schemas) on which permissions are applied.

Catalogs (logical collections of databases).
Schemas (collections of tables or views within a catalog).
Tables (structured data in rows and columns).
Views, Functions, External Locations, etc.

Example: In Unity Catalog, the catalog named main, a schema like sales_db, and a table called transactions are all securable objects.

Concept	Principals	Privileges	Securable Objects
Definition	Entities that can be granted permissions (e.g., users, groups, or roles).	The specific rights or actions that a principal can perform on a securable object.	The resources or entities (e.g., databases, tables, schemas) on which permissions are applied.
Examples	– Users (e.g., alice, bob) – Groups (e.g., DataEngineers) – Service Principals	– SELECT (read data) – INSERT (write data) – ALL PRIVILEGES (full access)	– Catalog – Schema – Table – External Location
Scope	Defines who can access or perform actions on resources.	Defines what actions are allowed for principals on securable objects.	Defines where privileges apply (i.e., what resources are being accessed).
Roles in Security Model	Principals represent users, groups, or roles that need permissions to access objects.	Privileges are permissions or grants that specify the actions a principal can perform.	Securable objects are the data resources and define the scope of where privileges are applied.
Granularity	Granularity depends on the level of access required for individual users or groups.	Granular permissions such as SELECT, INSERT, UPDATE, DELETE, or even specific column-level access.	Granular levels of objects from the entire catalog down to individual tables or columns.
Hierarchy	– Principals can be individual users, but more commonly, groups or roles are used to simplify management.	– Privileges can be granted at various levels (catalog, schema, table) and can be inherited from parent objects.	– Securable objects are structured hierarchically: catalogs contain schemas, which contain tables, etc.
Management	– Principals are typically managed by identity providers (e.g., Azure Entra ID, Databricks users, Active Directory).	– Privileges are managed through SQL commands like GRANT or REVOKE in systems like Unity Catalog.	– Securable objects are resources like catalogs, schemas, and tables that need to be protected with permissions.
Databricks Example	– User: databricks-user – Group: DataScientists	– GRANT SELECT ON TABLE sales TO DataScientists`;	– Catalog: main – Schema: sales_db – Table: transactions

Side by side Comparison

Securable objects in Unity Catalog are hierarchical, and privileges are inherited downward. The highest level object that privileges are inherited from is the catalog. This means that granting a privilege on a catalog or schema automatically grants the privilege to all current and future objects within the catalog or schema.

Show grants on objects in a Unity Catalog metastore

Catalog Explorer

In your Azure Databricks workspace, click Catalog.
Select the object, such as a catalog, schema, table, or view.
Go to the Permissions tab.

SQL

Run the following SQL command in a notebook or SQL query editor. You can show grants on a specific principal, or you can show all grants on a securable object.

SHOW GRANTS  [principal]   ON  <securable-type> <securable-name>

For example, the following command shows all grants on a schema named default in the parent catalog named main:

SHOW GRANTS ON SCHEMA main.default;

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

Appendix:

MS: Manage privileges in Unity Catalog

MS: Unity Catalog privileges and securable objects

Unity Catalog in Databricks

Unity Catalog is a fine-grained data governance solution for data present in a Data Lake for managing data governance, access control, and centralizing metadata across multiple workspaces. Unity Catalog provides centralized access control, auditing, lineage, and data discovery capabilities across Azure Databricks workspaces. It brings a new layer of data management and security to your Databricks environment

Unity Catalog provides centralized access control, auditing, lineage, and data discovery capabilities across Azure Databricks workspaces.

Key features of Unity Catalog include

Define once, secure everywhere: Unity Catalog offers a single place to administer data access policies that apply across all workspaces.
Standards-compliant security model: Unity Catalog’s security model is based on standard ANSI SQL and allows administrators to grant permissions in their existing data lake using familiar syntax, at the level of catalogs, schemas (also called databases), tables, and views.
Built-in auditing and lineage: Unity Catalog automatically captures user-level audit logs that record access to your data. Unity Catalog also captures lineage data that tracks how data assets are created and used across all languages.
Data discovery: Unity Catalog lets you tag and document data assets, and provides a search interface to help data consumers find data.
System tables (Public Preview): Unity Catalog lets you easily access and query your account’s operational data, including audit logs, billable usage, and lineage.

Unity Catalog object model

The hierarchy of database objects in any Unity Catalog metastore is divided into three levels, represented as a three-level namespace (catalog.schema.table-etc)

Metastore

The metastore is the top-level container for metadata in Unity Catalog. It registers metadata about data and AI assets and the permissions that govern access to them. For a workspace to use Unity Catalog, it must have a Unity Catalog metastore attached.

Object hierarchy in the metastore

In a Unity Catalog metastore, the three-level database object hierarchy consists of catalogs that contain schemas, which in turn contain data and AI objects, like tables and models.

Level one: Catalogs are used to organize your data assets and are typically used as the top level in your data isolation scheme.

Level two: Schemas (also known as databases) contain tables, views, volumes, AI models, and functions.

Level three: Volumes, Tables, Views, Functions, Models (AI models packaged with MLflow)

Working with database objects in Unity Catalog

Working with database objects in Unity Catalog is very similar to working with database objects that are registered in a Hive metastore, with the exception that a Hive metastore doesn’t include catalogs in the object namespace. You can use familiar ANSI syntax to create database objects, manage database objects, manage permissions, and work with data in Unity Catalog. You can also create database objects, manage database objects, and manage permissions on database objects using the Catalog Explorer UI.

Granting and revoking access to database objects

You can grant and revoke access to securable objects at any level in the hierarchy, including the metastore itself. Access to an object implicitly grants the same access to all children of that object, unless access is revoked.


GRANT CREATE TABLE ON SCHEMA mycatalog.myschema TO `finance-team`;

Using sp_MSforeachdb to Search for Objects Across All Databases

You are working on a project that requires migrating legacy stuffs from old environment to a new one, and requirement says upgrade business logic to latest. Unfortunately, not enough and clearly documents for you to refer. You did not even know an object where it is since there are so many databases resident in the same server, so many tables, so many views, stored procedure, user defined functions… etc. It is hard to find out the legacy business logics.

sp_MSforeachdb

The sp_MSforeachdb procedure is an undocumented procedure that allows you to run the same command against all databases. There are several ways to get creative with using this command and we will cover these in the examples below. This can be used to select data, update data and even create database objects. You can use the sp_MSforeachdb stored procedure to search for objects by their name across all databases:

EXEC sp_MSforeachdb
'USE [?];
SELECT ''?'' AS DatabaseName, name AS ObjectName
, type_desc AS ObjectType
, Type_desc as object_Desc
, Create_date
, Modify_date
FROM sys.objects
WHERE name = ''YourObjectName'';';

Replace ‘YourObjectName’ with the actual name of the object you’re searching for (table, view, stored procedure, etc.).

The type_desc column will tell you the type of the object (e.g., USER_TABLE, VIEW, SQL_STORED_PROCEDURE, etc.).

For example, find out the “tb” prefix objects

EXEC sp_MSforeachdb
'USE [?];
SELECT ''?'' AS DatabaseName
, name AS objectName
, type_desc as object_Desc
, create_date, Modify_date
FROM sys.objects
WHERE name like ''tb%'';'
go

Alternative – Loop Through Databases Using a Cursor

If sp_MSforeachdb is not available, you can use a cursor to loop through each database and search for the object:

DECLARE @DBName NVARCHAR(255);
DECLARE @SQL NVARCHAR(MAX);

DECLARE DB_Cursor CURSOR FOR
SELECT name
FROM sys.databases
WHERE state = 0;  -- Only look in online databases

OPEN DB_Cursor;
FETCH NEXT FROM DB_Cursor INTO @DBName;

WHILE @@FETCH_STATUS = 0
BEGIN
    SET @SQL = 
    'USE [' + @DBName + ']; 
     IF EXISTS (SELECT 1 FROM sys.objects WHERE name = ''YourObjectName'')
     BEGIN
         SELECT ''' + @DBName + ''' AS DatabaseName, name AS ObjectName, type_desc AS ObjectType
         FROM sys.objects
         WHERE name = ''YourObjectName'';
     END';

    EXEC sp_executesql @SQL;

    FETCH NEXT FROM DB_Cursor INTO @DBName;
END;

CLOSE DB_Cursor;
DEALLOCATE DB_Cursor;

You can modify the query to search for specific object types, such as tables or stored procedures:

Find the table

You can use the sp_MSforeachdb system stored procedure to search for the table across all databases.

EXEC sp_MSforeachdb 
'USE [?]; 
 SELECT ''?'' AS DatabaseName, name AS TableName 
 FROM sys.tables 
 WHERE name = ''YourTableName'';';

If sp_MSforeachdb is not enabled or available, you can use a cursor to loop through all databases and search for the table:

DECLARE @DBName NVARCHAR(255);
DECLARE @SQL NVARCHAR(MAX);

DECLARE DB_Cursor CURSOR FOR
SELECT name
FROM sys.databases
WHERE state = 0;  -- Only look in online databases

OPEN DB_Cursor;
FETCH NEXT FROM DB_Cursor INTO @DBName;

WHILE @@FETCH_STATUS = 0
BEGIN
    SET @SQL = 
    'USE [' + @DBName + ']; 
     IF EXISTS (SELECT 1 FROM sys.tables WHERE name = ''YourTableName'')
     BEGIN
         SELECT ''' + @DBName + ''' AS DatabaseName, name AS TableName
         FROM sys.tables
         WHERE name = ''YourTableName'';
     END';

    EXEC sp_executesql @SQL;

    FETCH NEXT FROM DB_Cursor INTO @DBName;
END;

CLOSE DB_Cursor;
DEALLOCATE DB_Cursor;

Find the View

You can use the sp_MSforeachdb system stored procedure to search for the view across all databases.

EXEC sp_MSforeachdb 
'USE [?]; 
 SELECT ''?'' AS DatabaseName, name AS TableName 
 FROM sys.views
 WHERE name = ''YourTableName'';';

Find the Stored Procedure

EXEC sp_MSforeachdb 
'USE [?]; 
 SELECT ''?'' AS DatabaseName, name AS ProcedureName 
 FROM sys.procedures 
 WHERE name = ''YourProcedureName'';';

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

Create External Data Sources in Synapse Serverless SQL

An external data source in Synapse serverless SQL is typically used to reference data stored outside of the SQL pool, such as in Azure Data Lake Storage (ADLS) or Blob Storage. This allows you to query data directly from these external sources using T-SQL.

There are different ways to create external data source. Using Synapse Studio UI, coding etc. the easiest way is to leverage Synapse Studio UI. But we had better know how to use code to create it since in some cases we have to use this way.

Here’s how to create an external data source in Synapse serverless SQL

Using Synapse Studio UI to create External Data Source

Create Lake Database

Open Synapse Studio

On the left side, select Data portal > workspace

Fill in the properties:

Create external table from data lake

Double clicks the Lake Database you just created.

in the Lake Database tag, click “+ Table”

fill in the detail information:

Continue to configure the table properyies

Adjust Table properties

Adjust column other properties, or add even more columns, such as data type, description, Nullability, Primary Key, set up partition create relationship …… etc.

Repeat the above steps to create even more tables to meet your business logic need, or create relationship if need.

Script to create an External Data Source

Step 1:

1. Connect to Serverless SQL Pool:

Open Synapse Studio, go to the “Data” hub, and connect to your serverless SQL pool.

2. Create the External Data Source:

Use the following T-SQL script to create an external data source that points to your Azure Data Lake Storage (ADLS) or Blob Storage:

CREATE EXTERNAL DATA SOURCE MyExternalDataSource
WITH (
LOCATION = ‘https://<your-storage-account-name>.dfs.core.windows.net/<your-filesystem-name>‘,
CREDENTIAL = <your-credential-name>
);

Replace <your-storage-account-name>, <your-filesystem-name>, and <your-credential-name> with the appropriate values:

LOCATION: The URL of your Azure Data Lake Storage (ADLS) or Blob Storage.
CREDENTIAL: The name of the database credential used to access the storage. (You may need to create this credential if it doesn’t already exist.)

Step 2:

If you don’t have a credential yet, create one as follows:

1. Create a Credential:

CREATE DATABASE SCOPED CREDENTIAL MyStorageCredential
WITH IDENTITY = ‘SHARED ACCESS SIGNATURE’,
SECRET = ”;

Replace <your-SAS-token> with your Azure Storage Shared Access Signature (SAS) token.

2. Create an External Table or Query the External Data

After setting up the external data source, you can create external tables or directly query data:

Create an External Table:

You can create an external table that maps to the data in your external storage:

CREATE EXTERNAL TABLE MyExternalTable (
Column1 INT,
Column2 NVARCHAR(50),
Column3 DATETIME
)
WITH (
LOCATION = ‘/path/to/data.csv’,
DATA_SOURCE = MyExternalDataSource,
FILE_FORMAT = MyFileFormat — You need to define a file format
);

Query the External Data

You can also directly query the data without creating an external table:

SELECT *
FROM OPENROWSET(
BULK ‘/path/to/data.csv’,
DATA_SOURCE = ‘MyExternalDataSource’,
FORMAT = ‘CSV’,
FIELDTERMINATOR = ‘,’,
ROWTERMINATOR = ‘\n’
) AS MyData;

Create and Use a File Format (Optional)

If you are querying structured files (like CSV, Parquet), you might need to define a file format:

CREATE EXTERNAL FILE FORMAT MyFileFormat
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (FIELD_TERMINATOR = ‘,’, STRING_DELIMITER = ‘”‘)
);

Summary

By following these steps, you should be able to connect to and query your external data sources using the serverless SQL pool in Synapse. Let me know if you need further assistance!

Create an external data source in Synapse serverless SQL to point to your external storage.
Create a database scoped credential if necessary to access your storage.
Create an external table or directly query data using OPENROWSET.
Define a file format if working with structured data like CSV or Parquet.

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

Day 10: Workflows in Azure Purview

Workflow in Purview

In Azure Purview, a workflow refers to a set of predefined, automated tasks that streamline and manage certain processes, such as approving data access requests or managing metadata curation.

Workflows in Azure Purview help automate data governance operations and improve collaboration across teams. They can include tasks like approving the registration of new data sources, assigning data stewards, or managing access to certain datasets.

Key Features of Workflows in Azure Purview:

Data Curation and Approval Processes
Workflows can be used to automate the process of curating and approving metadata changes or lineage extraction requests.
Automated Notifications
Users can receive notifications when tasks like data registration or classification have been completed or need approval.
Customizable Steps
Workflows are customizable, allowing we to define multiple stages of approval or validation depending on the organization’s governance policies.
Access Management
We can define workflows to manage how access requests are handled for sensitive data.
Approval Requests
For example, if someone registers a new data source, the workflow might require approval from a Purview Collection Administrator before proceeding.

Next step: Day 11 –

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

Day 11: Purview Access Control and Management

This article describes collections and access management for your account in the Microsoft Purview governance portal.

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

Next Step: Day 11: