dbutls: notebook run(), exit() and pass parameters

In Databricks, dbutils.notebook provides a set of utilities that allow you to interact with notebooks programmatically. This includes running other notebooks, exiting a notebook with a result, and managing notebook workflows.

Parent Notebook pass parameters to child notebook

run()

dbutils.notebook.run()

run(path: String, timeoutSeconds: int, arguments: Map): String -> This method runs a notebook and returns its exit value

The dbutils.notebook.run() function allows you to run another notebook from your current notebook. It also allows you to pass parameters to the called child notebook and capture the result of the execution.

  • notebook_path: The path to the notebook you want to run. This can be a relative or absolute path.
  • timeout_seconds: How long to wait before timing out. If the notebook does not complete within this time, an error will occur.

    In other words, if the notebook completes before the timeout, it proceeds as normal, returning the result. However, if the notebook exceeds the specified timeout duration, the notebook run is terminated, and an error is raised.

  • arguments: A dictionary of parameters to pass to the called notebook. The called notebook can access these parameters via dbutils.widgets.get().
Parent notebooks
# Define parameters to pass to the child notebook
params = {
  "param1": "value1",
  "param2": "value2"
}


# Run the child notebook and capture its result

result =
dbutils.notebook.run("/Users/your-email@domain.com/child_notebook",
60, params)

 
# Print the result returned from the child notebook

print(f"Child notebook result:
{result}")


Parent notebook calls/runs his child notebook in python only, cannot use SQL

In the child notebook, you can retrieve the passed parameters using dbutils.widgets.get():

Child notebook
param1 = dbutils.widgets.get("param1")
param2 = dbutils.widgets.get("param2")

print(f"Received param1: {param1}")
print(f"Received param2: {param2}")

#SQL

— Use the widget values in a query
SELECT * FROM my_table WHERE column1 = ‘${getArgument(‘param1′)}’ AND column2 = ‘${getArgument(‘param2′)}’;

Child notebook returns values to parent notebook

When parent notebook run/call a child notebook using dbutils.notebook.run(), the child notebook can return a single value (usually a string) using dbutils.notebook.exit() return value to parent notebook. The parent notebook can capture this return value for further processing.

Key Points:

  • The value returned by dbutils.notebook.exit() must be a string.
  • The parent notebook captures this return value when calling dbutils.notebook.run().

exit()

dbutils.notebook.help() get help.

dbutils.notebook.exit(value: String): void 

dbutils.notebook.exit() Exit a notebook with a result.

The dbutils.notebook.exit() function is used to terminate the execution of a notebook and return a value to the calling notebook.

After this executed, all below cells commend will skipped, will not execute.

#cell1
var = "hello"
print (var)

#cell2
var2 = "world"
dbutils.notebook.exit(var2)

#cell3
var3 = "good news"
print(var3)

Parent notebook uses child notebook returned value

Parent Notebook
#parent notebook
# Call the child notebook and pass any necessary parameters 
result = dbutils.notebook.run("/Notebooks/child_notebook", 60, {"param1": "some_value"})

#use the child notebook returned value 
print(f"I use the Returned result: {result}")



# Use the result for further logic 
if result == "Success": 
     print("The child notebook completed successfully!") 
else: 
     print("The child notebook encountered an issue.")

child Notebook
#child Notebook
# Simulate some processing (e.g., a query result or a status) 
result_value = "Success" 

# Return the result to the parent notebook 
dbutils.notebook.exit(result_value)

Handling Complex Return Values

Since dbutils.notebook.exit() only returns a string, if you need to return a more complex object (like a dictionary or a list), you need to serialize it to a string format (like JSON) and then deserialize it in the parent notebook.

Child Notebook:

import json

# Simulate a complex return value (a dictionary)
result = {"status": "Success", "rows_processed": 1234}

# Convert the dictionary to a JSON string and exit
dbutils.notebook.exit(json.dumps(result))

Parent Notebook:

import json

# Run the child notebook
result_str = dbutils.notebook.run("/Notebooks/child_notebook", 60, {"param1": "some_value"})

# Convert the returned JSON string back into a dictionary
result = json.loads(result_str)

# Use the values from the result
print(f"Status: {result['status']}")
print(f"Rows Processed: {result['rows_processed']}")

Summary:

  • You can call child notebooks from a parent notebook using Python (dbutils.notebook.run()), but not with SQL directly.
  • You can pass parameters using widgets in the child notebook.
  • Python recommend to use dbutils.get(“parameterName”), still can use getArgument(“parameterName”)
  • SQL use getArgument(“parameterName”) in child notebook only.
  • Results can be returned to the parent notebook using dbutils.notebook.exit().

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

dbutils: widgets

In Databricks notebooks, dbutils.widgets provide a way to create interactive controls like dropdowns, text inputs, and multi-selects. These widgets make your notebooks more interactive by allowing users to input parameters that can drive the notebook’s behavior without editing the code itself.

Types of Widgets

  • Text Box (dbutils.widgets.text): Allows users to input free-form text.
  • Dropdown (dbutils.widgets.dropdown): Presents a dropdown menu with predefined options.
  • Combobox (dbutils.widgets.combobox): A combination of a text box and a dropdown, allowing users to either select from a list or enter a new value.
  • Multi-Select (dbutils.widgets.multiselect): Allows users to select multiple options from a dropdown list.

Common dbutils.widgets Commands

Create a Text Box

dbutils.widgets.text(“input_text”, “default_value”, “Text Input”)

#SQL

CREATE WIDGET TEXT tableName DEFAULT ‘customers’

  • “input_text”: The name of the widget (used to retrieve the value).
  • “default_value”: Default value shown when the widget is created.
  • “Text Input”: Label shown next to the widget in the notebook UI.

Create a Dropdown

dbutils.widgets.dropdown(“dropdown”, “option1”, [“option1”, “option2”, “option3”], “Dropdown Label”)

#SQL

CREATE WIDGET DROPDOWN country DEFAULT ‘USA’ CHOICES [‘USA’, ‘UK’, ‘India’]

  • "dropdown": The name of the widget.
  • "option1": Default selected option.
  • ["option1", "option2", "option3"]: List of options.
  • "Dropdown Label": Label for the dropdown.

Create a Combobox

dbutils.widgets.combobox(“combobox”, “option1”, [“option1”, “option2”, “option3”], “Combobox Label”)

Create a Multi-Select

dbutils.widgets.multiselect(“multi_select”, “option1”, [“option1”, “option2”, “option3”], “Multi-Select Label”)

#SQL

CREATE WIDGET MULTISELECT status DEFAULT ‘active’ CHOICES [‘active’, ‘inactive’, ‘pending’]

Retrieving Widget Values

value = dbutils.widgets.get(“widget_name”)
print(f”Selected value: {value}”)

#SQL

SELECT * FROM ${getArgument(‘tableName’)}
WHERE country = ‘${getArgument(‘country’)}’
AND status IN (${getArgument(‘status’)})

Cation, In sql, use getArgument.

Removing Widgets

Remove a Single Widget

dbutils.widgets.remove(“widget_name”)

#SQL

— Remove widgets when no longer needed
REMOVE WIDGET widget_name

Remove All Widgets

dbutils.widgets.removeAll()

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

DBFS: Access ADLS or Blob using Service Principle with Oauth2

There are several ways to connect Azure Data Lake Storage (ADLS) Gen2 or Blob to Databricks. Each method offers different levels of security, flexibility, and complexity. Such as

  • Mount ADLS to DBFS
  • Service Principal with OAuth 2.0
  • Direct access with ABFS  URI (Azure Blob File System)
  • Azure Active Directory (AAD) Passthrough
  • SAS Token (Shared Access Signature)
  • Managed Identity (MSI)
  • Access Key Authentication

 We have discussed “mount”, Account Key, SAS token at dbutils: mount, using Account Key or SAS to access adls or blob

Today we focus on using Service principal with OAuth2.0 to access ADLS or Blob.

Access ADLS Gen2 using Service Principal with OAuth 2.0

To access Azure Data Lake Storage (ADLS) or Blob Storage using a service principal with OAuth2 in Azure Databricks, follow these steps. This approach involves authenticating using a service principal’s credentials, such as the client ID, tenant ID, and client secret.

Previously, we discussed how to create Service Principle, Register an application on Azure Entra ID (former Active Directory), generate client-secret-value. To get client.id and client-secret-value  please review this article.

if we have save the client-secret-value at databricks’ “secret-scopes”, please review the article “dbutils: Secrets and Secret Scopes“.

Define service principal credentials

# Define service principal credentials
client_id = "<your-client-id>"
tenant_id = "<your-tenant-id>"

# It's recommended to store the secret securely in a secret scope
client_secret = dbutils.secrets.get(scope = "<scope-name>", key = "<client-secret-key>")  

OAuth2 configuration for service principal

# OAuth2 configuration for service principal
configs = {
  "fs.azure.account.auth.type": "OAuth",
  "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id": "<client-id>",
  "fs.azure.account.oauth2.client.secret": "<client-secret>",
  "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant-id>/oauth2/token"
}

Mount ADLS to DBFS

#Mount ADLS to DBFS
dbutils.fs.mount(
  source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
  mount_point = "/mnt/adls",
  extra_configs = configs
)

full example here.

# define variable
TenantID="b4e0dd6XXXXXXXXXXXXXXXX99ec4b8e"
clientID="72baeXXXXXXXXXXXXXX659ea547"
Secret-scope-name="my_scopes"
secrets-key = "my_secret_key"
storage_account_name = "asamainriadls"
container = "mainri-asa-file-system"
my_mount_point = "/mnt/mainriadlsfilesystem"

#get secret
my_secret = dbutils.secrets.get(scope=Secret-scope-name, key=secrets-key)

# OAuth2 configuration for service principal
configs = {
"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": f"{ClientID}",
"fs.azure.account.oauth2.client.secret": f"{my_secret}",
"fs.azure.account.oauth2.client.endpoint": f”https://login.microsoftonline.com/{TenantID}/oauth2/token”
}


In fact, we are able to directly access to ADLS/blob now.

# in fact,  we are able to directly access adls/blob now.
#adls
df = spark.read.csv(f"abfs://{container}@{storage_account_name }.dfs.windows.core.net/input/test.csv")

display(df)

# blob
display(spark.read.csv(f"abfs://{container}@{storage_account_name }.dfs.windows.core.net/input/test.csv"))

Of course, we are able to mount the ADLS to DBFS if we like; but it’s not necessary at this moment for demo.


# mount to DBFS
dbutils.fs.mount(
  source = f"abfss://{container} @{storage_account_name}.dfs.core.windows.net/",
  mount_point = my_mount_point,
  extra_configs = configs
)

#Mount Blob to DBFS
dbutils.fs.mount(
  source = "abfss://<container-name>@<storage-account-name>.blob.core.windows.net/",
  mount_point = "/mnt/adls",
  extra_configs = configs
)

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

dbutils: mount, using Account Key or SAS to access adls or blob

We can mount Azure Data Lake Storage (ADLS), Azure Blob Storage, or other compatible storage to Databricks using dbutils.fs.mount(), with either an account key or a SAS token for authentication.

mount()

dbutils.fs.help(“mount”)

Here’s the general syntax:


dbutils.fs.mount(
  source = "<storage-url>",
  mount_point = "/mnt/<mount-name>",
  extra_configs = {"<conf-key>":dbutils.secrets.get(scope="<scope-name>", key="<key-name>")})

<storage-url>


Blob: 
storage-url = f"wasbs://{container_name}@{ storage_account_name.blob.core.windows.net"


Adls:
storage-url = f"abfss://{container_name}@{storage_account_name}.dfs.core.windows.net/"

<conf-key>


Blob: 
conf-key = f"fs.azure.account.key.{storage_account_name}.blob.core.windows.net"


Adls:
conf-key = f"fs.azure.account.key.{storage_account_name}.dfs.core.windows.net"

List Mounts:

dbutils.fs.help(“mounts”)

To check all mounted points, you can use:


dbutils.fs.mounts()

unmount()

dbutils.fs.help(“unmount”)


dbutils.fs.unmount("/mnt/<mount-name>")

refreshMounts()

in cluster to refresh their mount cache ensuring they receive the most recent information.

dbutils.fs.help(“refreshMounts”)


dbutils.fs.refreshMounts()

updateMount()


dbutils.fs.updateMount(
  source = "<new-storage-url>",
  mount_point = "/mnt/<existing-mount-point>",
  extra_configs = {"<conf-key>":dbutils.secrets.get(scope="<scope-name>",
key="<key-name>")})

Mount storage

You can get the access key from

Azure Portal > storage > security + networking

e.g.

Mounting an Azure Data Lake (adls) Gen2 Storage to DBFS

Set up your storage account details:

  • Storage URL: Use the appropriate URL for your data, e.g.,
    abfss://<file-system>@<storage-account>.dfs.core.windows.net/ for ADLS Gen2.
  • Mount point: Choose a directory in the Databricks file system /mnt/ to mount the storage.
  • Extra configs: You usually provide your credentials here, often through a secret scope.

Mount the ADLS storage:


storage_account_name = "<your-storage-account-name>"
container_name = "<your-container-name>"
mount_point = "/mnt/<your-mount-name>"

# Use a secret scope to retrieve the account key
configs = {"fs.azure.account.key." + storage_account_name + ".dfs.core.windows.net": dbutils.secrets.get(scope = "<scope-name>", key = "<key-name>")}

# Perform the mount
dbutils.fs.mount(
  source = f"abfss://{container_name}@{storage_account_name}.dfs.core.windows.net/",
  mount_point = mount_point,
  extra_configs = configs)

 Mount Azure Blob Storage to DBFS

We can mount Azure Blob Storage either by Account Key or SAS key.


dbutils.fs.mount(
Source=”wasbs://<contain-name>@<storage-account-name>.blob.core.windows.net”,
Mount_point = “/mnt/<mount-name>”,
Extra_configs = {“<conf-key>”:” account-key”}
)

In Case of account key, <conf-key> is
fs.azure.account.key.<storage-account-name>.blob.vore.windows.net

In case of SAS (shared access signature) key , <conf-key>  is
fs.azure.sas.<container-name>.<storage-account-name>.blob.core.windows.net

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

dbutils: Secrets and Secret Scopes

Secret Scopes:

A secret scope is a boundary within which secrets are stored. We can create secret scopes to securely manage access to sensitive data and credentials.

Each scope can contain multiple secrets (key-value pairs) like tokens or passwords.

Secrets:

Secrets are the actual values you want to store securely (e.g., API tokens, connection strings). You can store secrets in specific scopes.

Types of Secret Scopes in Databricks

  • Databricks-Backed Secret Scopes: Stored directly in Databricks and managed entirely within the platform.
  • Azure Key Vault-Backed Secret Scopes: Secrets are stored in Azure Key Vault and managed externally but accessed from within Databricks.

Common dbutils.secrets Commands:

  • dbutils.secrets.listScopes(): Lists all the available secret scopes.
  • dbutils.secrets.list(“scope-name”): Lists all the secrets within a given scope.
  • dbutils.secrets.get(“scope-name”, “key”): Retrieves the secret for the given scope and key.

#Lists all the available secret scopes
dbutils.secrets.listScopes()

#Lists all the secrets within a given scope
dbutils.secrets.list("scope-name")

# Retrieves the secret for the given scope and key
my_secret = dbutils.secrets.get(scope="my_scope", key="my_secret_key")

example in notebooks and response are shown later.

Creating and Managing Secret Scopes

To create and manage secret scopes, you can either use the Databricks CLI or Databricks UI, depending on the permissions and environment you’re working with.

Create an Azure Key Vault-backed secret scope

1: Go to https://<databricks-instance>/#secrets/createScope.

Replace <databricks-instance> with the workspace URL of your Azure Databricks deployment. This URL is case sensitive (scope in createScope must be uppercase).

e.g. https://adb-44260********40553.13.azuredatabricks.net/#secrets/createScope

2: Enter the name of the secret scope.

Secret scope names are case insensitive

3. Manage Principal 

Use the Manage Principal drop-down to specify whether All Users have MANAGE permission for this secret scope or only the Creator of the secret scope (that is to say, you).

4. DNS Name and Resource ID

DNS Name, for example, https://mainri-key-vault.vault.azure.net/  

These properties are available from the Settings > Properties tab of an Azure Key Vault in your Azure portal.

 5. Click the Create button.

The secret scope named mainri_secret_scopes has been added.

Manage secrets in this scope in Azure KeyVault with manage principal = creator

Create an Azure Key Vault-backed secret

Nothing special, the normal proceed to create secret in azure key vault, omitting.

If you need, please review my previous article “Create and Using Key Vault in Azure Ecosystem”.

Create Secret scopes using Databricks CLI


#Python
# create Secret scopes
databricks secrets create-scope --initial-manage-principal users

#bash
%bash
databricks secrets put --scope <existing-scope-name> --key <secret-key>

Use Secrets


dbutils.secrets.listScopes()
dbutils.secrets.list('mainri_secret_scopes')
dbutils.secrets.get(scope="mainri_secret_scopes", key="creator")

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

Appendix

Using Key Vault services in Azure Ecosystem

dbutils: Databricks File System, dbutils

Databricks File System (DBFS)  is a distributed file system mounted into a Databricks workspace and available on Databricks clusters. DBFS is an abstraction on top of scalable object storage.

Databricks recommends that you store data in mounted object storage rather than in the DBFS root. The DBFS root is not intended for production customer data.

DBFS root is the default file system location provisioned for a Databricks workspace when the workspace is created. It resides in the cloud storage account associated with the Databricks workspace

Databricks dbutils

**dbutils** is a set of utility functions provided by Databricks to help manage and interact with various resources in a Databricks environment, such as files, jobs, widgets, secrets, and notebooks. It is commonly used in Databricks notebooks to perform tasks like handling file systems, retrieving secrets, running notebooks, and controlling job execution.

Dbutils.help()

  • credentials: DatabricksCredentialUtils -> Utilities for interacting with credentials within notebooks
  • data: DataUtils -> Utilities for understanding and interacting with datasets (EXPERIMENTAL)
  • fs: DbfsUtils -> Manipulates the Databricks filesystem (DBFS) from the console
  • jobs: JobsUtils -> Utilities for leveraging jobs features
  • library: LibraryUtils -> Utilities for session isolated libraries
  • meta: MetaUtils -> Methods to hook into the compiler (EXPERIMENTAL)
  • notebook: NotebookUtils -> Utilities for the control flow of a notebook (EXPERIMENTAL)
  • preview: Preview -> Utilities under preview category
  • secrets: SecretUtils -> Provides utilities for leveraging secrets within notebooks
  • widgets: WidgetsUtils -> Methods to create and get bound value of input widgets inside notebooks

1. dbutils.fs (File System Utilities)

dbutils.fs.help()

dbutils.fs provides utilities to interact with various file systems, like DBFS (Databricks File System), Azure Blob Storage, and others, similarly to how you would interact with a local file system.

List Files:

dbutils.fs.ls(“/mnt/”)

Mount Azure Blob Storage:


dbutils.fs.mount(
    source = "wasbs://<container>@<storage-account>.blob.core.windows.net",
    mount_point = "/mnt/myblobstorage",
    extra_configs = {"<key>": "<value>"}
)

Unmount

dbutils.fs.unmount("/mnt/myblobstorage")

Copy Files:

dbutils.fs.cp("/mnt/source_file.txt", "/mnt/destination_file.txt")

Remove Files:

dbutils.fs.rm("/mnt/myfolder/", True)  # True to remove recursively

Move Files:

dbutils.fs.mv("/mnt/source_file.txt", "/mnt/destination_file.txt")

dbutils.secrets (Secret Management)

dbutils.secrets is used to retrieve secrets stored in Databricks Secret Scopes. This is essential for securely managing sensitive data like passwords, API keys, or tokens.

dbutils.secrets.help()

Get a Secret:

my_secret = dbutils.secrets.get(scope="my-secret-scope", key="my-secret-key")

List Secrets:

dbutils.secrets.list(scope="my-secret-scope")

List Secret Scopes:

dbutils.secrets.listScopes()

dbutils.widgets (Parameter Widgets)

dbutils.notebook provides functionality to run one notebook from another and pass data between notebooks. It’s useful when you want to build modular pipelines by chaining multiple notebooks.

dbutils.widgets.help()

Run Another Notebook:

dbutils.notebook.run("/path/to/other_notebook", 60, {"param1": "value1", "param2": "value2"})

Runs another notebook with specified timeout (in seconds) and parameters. You can pass parameters as a dictionary.

Exit a Notebook:

dbutils.notebook.exit("Success")

Exits the notebook with a status message or value.

Return Value from a Notebook:

result = dbutils.notebook.run("/path/to/notebook", 60, {"param": "value"})
print(result)

dbutils.jobs (Job Utilities)

dbutils.jobs helps with tasks related to job execution within Databricks, such as getting details about the current job or task.

dbutils.jobs.help()

Get Job Run Information

job_info = dbutils.jobs.taskValues.get(job_id="<job_id>", task_key="<task_key>")

dbutils.library

Manages libraries within Databricks, like installing and updating them (for clusters).

dbutils.library.installPyPI("numpy")

Example

# Mount Azure Blob Storage using dbutils.fs
dbutils.fs.mount(
    source = "wasbs://mycontainer@myaccount.blob.core.windows.net",
    mount_point = "/mnt/mydata",
    extra_configs = {"fs.azure.account.key.myaccount.blob.core.windows.net": "<storage-key>"}
)

# List contents of the mount
display(dbutils.fs.ls("/mnt/mydata"))

# Get a secret from a secret scope
db_password = dbutils.secrets.get(scope="my-secret-scope", key="db-password")

# Create a dropdown widget to choose a dataset
dbutils.widgets.dropdown("dataset", "dataset1", ["dataset1", "dataset2", "dataset3"], "Choose Dataset")

# Get the selected dataset value
selected_dataset = dbutils.widgets.get("dataset")
print(f"Selected dataset: {selected_dataset}")

# Remove all widgets after use
dbutils.widgets.removeAll()

# Run another notebook and pass parameters
result = dbutils.notebook.run("/path/to/notebook", 60, {"input_param": "value"})
print(result)

Magic Command

list

Aspect%fs (Magic Command)%sh (Magic Command)dbutils.fs (Databricks Utilities)os.<> (Python OS Module)
Example Usage%fs ls /databricks-datasets%sh ls /tmpdbutils.fs.ls(“/databricks-datasets”)import os
os.listdir(“/tmp”)
Cloud Storage MountsCan access mounted cloud storage paths.No, unless the cloud storage is accessible from the driver node.Can mount and access external cloud storage (e.g., S3, Azure Blob) to DBFS.No access to mounted DBFS or cloud storage.
Use CaseLightweight access to DBFS for listing, copying, removing files.Execute system-level commands from notebooks.Programmatic, flexible access to DBFS and cloud storage.Access files and environment variables on the local node.