DBFS: Access ADLS or Blob using Service Principle with Oauth2

There are several ways to connect Azure Data Lake Storage (ADLS) Gen2 or Blob to Databricks. Each method offers different levels of security, flexibility, and complexity. Such as

  • Mount ADLS to DBFS
  • Service Principal with OAuth 2.0
  • Direct access with ABFS  URI (Azure Blob File System)
  • Azure Active Directory (AAD) Passthrough
  • SAS Token (Shared Access Signature)
  • Managed Identity (MSI)
  • Access Key Authentication

 We have discussed “mount”, Account Key, SAS token at dbutils: mount, using Account Key or SAS to access adls or blob

Today we focus on using Service principal with OAuth2.0 to access ADLS or Blob.

Access ADLS Gen2 using Service Principal with OAuth 2.0

To access Azure Data Lake Storage (ADLS) or Blob Storage using a service principal with OAuth2 in Azure Databricks, follow these steps. This approach involves authenticating using a service principal’s credentials, such as the client ID, tenant ID, and client secret.

Previously, we discussed how to create Service Principle, Register an application on Azure Entra ID (former Active Directory), generate client-secret-value. To get client.id and client-secret-value  please review this article.

if we have save the client-secret-value at databricks’ “secret-scopes”, please review the article “dbutils: Secrets and Secret Scopes“.

Define service principal credentials

# Define service principal credentials
client_id = "<your-client-id>"
tenant_id = "<your-tenant-id>"

# It's recommended to store the secret securely in a secret scope
client_secret = dbutils.secrets.get(scope = "<scope-name>", key = "<client-secret-key>")  

OAuth2 configuration for service principal

# OAuth2 configuration for service principal
configs = {
  "fs.azure.account.auth.type": "OAuth",
  "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id": "<client-id>",
  "fs.azure.account.oauth2.client.secret": "<client-secret>",
  "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant-id>/oauth2/token"
}

Mount ADLS to DBFS

#Mount ADLS to DBFS
dbutils.fs.mount(
  source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
  mount_point = "/mnt/adls",
  extra_configs = configs
)

full example here.

# define variable
TenantID="b4e0dd6XXXXXXXXXXXXXXXX99ec4b8e"
clientID="72baeXXXXXXXXXXXXXX659ea547"
Secret-scope-name="my_scopes"
secrets-key = "my_secret_key"
storage_account_name = "asamainriadls"
container = "mainri-asa-file-system"
my_mount_point = "/mnt/mainriadlsfilesystem"

#get secret
my_secret = dbutils.secrets.get(scope=Secret-scope-name, key=secrets-key)

# OAuth2 configuration for service principal
configs = {
"fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": f"{ClientID}",
"fs.azure.account.oauth2.client.secret": f"{my_secret}",
"fs.azure.account.oauth2.client.endpoint": f”https://login.microsoftonline.com/{TenantID}/oauth2/token”
}


In fact, we are able to directly access to ADLS/blob now.

# in fact,  we are able to directly access adls/blob now.
#adls
df = spark.read.csv(f"abfs://{container}@{storage_account_name }.dfs.windows.core.net/input/test.csv")

display(df)

# blob
display(spark.read.csv(f"abfs://{container}@{storage_account_name }.dfs.windows.core.net/input/test.csv"))

Of course, we are able to mount the ADLS to DBFS if we like; but it’s not necessary at this moment for demo.


# mount to DBFS
dbutils.fs.mount(
  source = f"abfss://{container} @{storage_account_name}.dfs.core.windows.net/",
  mount_point = my_mount_point,
  extra_configs = configs
)

#Mount Blob to DBFS
dbutils.fs.mount(
  source = "abfss://<container-name>@<storage-account-name>.blob.core.windows.net/",
  mount_point = "/mnt/adls",
  extra_configs = configs
)

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

Create Service Principle, Register an application on Azure Entra ID (former Active Directory)

A Service Principal in Azure is an identity used by applications, services, or automated tools to access specific Azure resources. It’s tied to an Azure App Registration and is used for managing permissions and authentication.

The Microsoft identity platform performs identity and access management (IAM) only for registered applications. Whether it’s a client application like a ADF or Synapse, Wen Application or mobile app, or it’s a web API that backs a client app, registering establishes a trust relationship between your application and the identity provider, the Microsoft identity platform.

This article is talking on registering an application in the Microsoft Entra admin center. I outline the registration procedure step by step.

Summary steps:

  1. Navigate to Azure Entra ID (Azure Active Directory)
  2. Create an App Registration
  3. Generate Client Secret,
    note down Important the Application (client) ID and Directory (tenant) ID,  Client-Secret-value.
  4. Using the Service Principle – Assign Roles to the Service Principal
    Navigate to the Azure resource (e.g., Storage Account, Key Vault, SQL Database) you want your Service Principal to access.

Step by Step Demo

Register a new Application on Azure Entra ID (formerly called Azure Active Directory), get an Application ID and Client Secret value.

Azure Portal >> Azure Entra ID (formerly called Azure Active Directory) 

(1)  Copy Tenant ID.

We need this Tenant ID later.

(2) App Registration

(3) Copy Application ID. We will use it later

(4) Create Client Secret

Generate a new client Secret,

(5) copy the Client Secret Value

Copy client-secret-value, we need it later.

Cause: the Client Secret Value you HAVE TO COPY IT RIGHT NOW! IMMEDIATELY copy NOW. And put it to a secure place. Since the Value WILL NOT reappear anymore. IMOPRTANT!

(6) Using the Service Principle – Assign Roles to the Service Principal

Assign Roles to the Service Principal

Now, assign permissions to your Service Principal so it can access specific Azure resources:

  1. Navigate to the Azure resource (e.g., Storage Account, Key Vault, SQL Database) you want your Service Principal to access.
  2. Go to Access Control (IAM).
  3. Click Add and choose Add role assignment.
  4. Choose a role (e.g., Contributor, Reader, or a custom role).
  5. Search for your App Registration by its name and select it.
  6. Save

We have finished all at Azure Entra ID (Former Azure Active Directory) 

Please do not hesitate to contact me if you have any questions at william . chen @mainri.ca

(remove all space from the email account 😊)

Appendix: Microsoft: Register an application with the Microsoft identity platform