Unity Catalog: Data Access Control with Databricks Unity Catalog

This article explains how to control access to data and other objects in Unity Catalog.

Principals

Entities that can be granted permissions (e.g., users, groups, or roles).

Example: A user like alice@company.com or a group like DataEngineers can be considered principals.

Privileges

The specific rights or actions that a principal can perform on a securable object.

  • SELECT: Read data from a table or view.
  • INSERT: Add data to a table.
  • UPDATE: Modify existing data.
  • DELETE: Remove data.
  • ALL PRIVILEGES: Grants all possible actions.

Example: GRANT SELECT ON TABLE transactions TO DataScientists;

Securable Objects

The resources or entities (e.g., databases, tables, schemas) on which permissions are applied.

  • Catalogs (logical collections of databases).
  • Schemas (collections of tables or views within a catalog).
  • Tables (structured data in rows and columns).
  • Views, Functions, External Locations, etc.

Example: In Unity Catalog, the catalog named main, a schema like sales_db, and a table called transactions are all securable objects.

ConceptPrincipalsPrivilegesSecurable Objects
DefinitionEntities that can be granted permissions (e.g., users, groups, or roles).The specific rights or actions that a principal can perform on a securable object.The resources or entities (e.g., databases, tables, schemas) on which permissions are applied.
Examples– Users (e.g., alice, bob)
– Groups (e.g., DataEngineers)
– Service Principals
– SELECT (read data)
– INSERT (write data)
– ALL PRIVILEGES (full access)
– Catalog
– Schema
– Table
– External Location
ScopeDefines who can access or perform actions on resources.Defines what actions are allowed for principals on securable objects.Defines where privileges apply (i.e., what resources are being accessed).
Roles in Security ModelPrincipals represent users, groups, or roles that need permissions to access objects.Privileges are permissions or grants that specify the actions a principal can perform.Securable objects are the data resources and define the scope of where privileges are applied.
GranularityGranularity depends on the level of access required for individual users or groups.Granular permissions such as SELECT, INSERT, UPDATE, DELETE, or even specific column-level access.Granular levels of objects from the entire catalog down to individual tables or columns.
Hierarchy– Principals can be individual users, but more commonly, groups or roles are used to simplify management.– Privileges can be granted at various levels (catalog, schema, table) and can be inherited from parent objects.– Securable objects are structured hierarchically: catalogs contain schemas, which contain tables, etc.
Management– Principals are typically managed by identity providers (e.g., Azure Entra ID, Databricks users, Active Directory).– Privileges are managed through SQL commands like GRANT or REVOKE in systems like Unity Catalog.– Securable objects are resources like catalogs, schemas, and tables that need to be protected with permissions.
Databricks Example– User: databricks-user
– Group: DataScientists
– GRANT SELECT ON TABLE sales TO DataScientists`;Catalog: main
Schema: sales_db
Table: transactions
Side by side Comparison

Securable objects in Unity Catalog are hierarchical, and privileges are inherited downward. The highest level object that privileges are inherited from is the catalog. This means that granting a privilege on a catalog or schema automatically grants the privilege to all current and future objects within the catalog or schema.

Show grants on objects in a Unity Catalog metastore

Catalog Explorer

  1. In your Azure Databricks workspace, click  Catalog.
  2. Select the object, such as a catalog, schema, table, or view.
  3. Go to the Permissions tab.

SQL

Run the following SQL command in a notebook or SQL query editor. You can show grants on a specific principal, or you can show all grants on a securable object.

SHOW GRANTS  [principal]   ON  <securable-type> <securable-name>

For example, the following command shows all grants on a schema named default in the parent catalog named main:

SHOW GRANTS ON SCHEMA main.default;

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

Appendix:

MS: Manage privileges in Unity Catalog

MS: Unity Catalog privileges and securable objects

Unity Catalog: Catalogs and Schemas

A catalog is the primary unit of data organization in the Azure Databricks Unity Catalog data governance model. it is the first layer in Unity Catalog’s three-level namespace (catalog.schema.table-etc). They contain schemas, which in turn can contain tables, views, volumes, models, and functions. Catalogs are registered in a Unity Catalog metastore in your Azure Databricks account.

Catalogs

Organize my data into catalogs

Each catalog should represent a logical unit of data isolation and a logical category of data access, allowing an efficient hierarchy of grants to flow down to schemas and the data objects that they contain. 

Catalogs therefore often mirror organizational units or software development lifecycle scopes. You might choose, for example, to have a catalog for production data and a catalog for development data, or a catalog for non-customer data and one for sensitive customer data.

Data isolation using catalogs

Each catalog typically has its own managed storage location to store managed tables and volumes, providing physical data isolation at the catalog level. 

Catalog-level privileges

grants on any Unity Catalog object are inherited by children of that object, owning a catalog 

Catalog types

  • Standard catalog: the typical catalog, used as the primary unit to organize your data objects in Unity Catalog. 
  • Foreign catalog: a Unity Catalog object that is used only in Lakehouse Federation scenarios.

Default catalog

If your workspace was enabled for Unity Catalog automatically, the pre-provisioned workspace catalog is specified as the default catalog. A workspace admin can change the default catalog as needed.

Workspace-catalog binding

If you use workspaces to isolate user data access, you might want to use workspace-catalog bindings. Workspace-catalog bindings enable you to limit catalog access by workspace boundaries. 

Create catalogs

Requirements: be an Azure Databricks metastore admin or have the CREATE CATALOG privilege on the metastore

To create a catalog, you can use Catalog Explorer, a SQL command, the REST API, the Databricks CLI, or Terraform. When you create a catalog, two schemas (databases) are automatically created: default and information_schema.

Catalog Explorer

  • Log in to a workspace that is linked to the metastore.
  • Click Catalog.
  • Click the Create Catalog button.
  • On the Create a new catalog dialog, enter a Catalog name and select the catalog Type that you want to create:
    Standard catalog: a securable object that organizes data and AI assets that are managed by Unity Catalog. For all use cases except Lakehouse Federation and catalogs created from Delta Sharing shares.
  • Foreign catalog: a securable object that mirrors a database in an external data system using Lakehouse Federation.
  • Shared catalog: a securable object that organizes data and other assets that are shared with you as a Delta Sharing share. Creating a catalog from a share makes those assets available for users in your workspace to read.

SQL

standard catalog

CREATE CATALOG [ IF NOT EXISTS ] <catalog-name>
   [ MANAGED LOCATION '<location-path>' ]
   [ COMMENT <comment> ];

  • <catalog-name>: A name for the catalog.
  • <location-path>: Optional but strongly recommended. 
    e.g. <location-path>: ‘abfss://my-container-name@storage-account-name.dfs.core.windows.net/finance’ or ‘abfss://my-container-name@storage-account-name.dfs.core.windows.net/finance/product’
shared catalog

CREATE CATALOG [IF NOT EXISTS] <catalog-name>
USING SHARE <provider-name>.<share-name>;
[ COMMENT <comment> ];

foreign catalog

CREATE FOREIGN CATALOG [IF NOT EXISTS] <catalog-name> USING CONNECTION <connection-name>
OPTIONS [(database '<database-name>') | (catalog '<external-catalog-name>')];

  • <catalog-name>: Name for the catalog in Azure Databricks.
  • <connection-name>: The connection object that specifies the data source, path, and access credentials.
  • <database-name>: Name of the database you want to mirror as a catalog in Azure Databricks. Not required for MySQL, which uses a two-layer namespace. For Databricks-to-Databricks Lakehouse Federation, use catalog ‘<external-catalog-name>’ instead.
  • <external-catalog-name>: Databricks-to-Databricks only: Name of the catalog in the external Databricks workspace that you are mirroring.

Schemas

Schema is a child of a catalog and can contain tables, views, volumes, models, and functions. Schemas provide more granular categories of data organization than catalogs.

Precondition

  • Have a Unity Catalog metastore linked to the workspace where you perform the schema creation
  • Have the USE CATALOG and CREATE SCHEMA data permissions on the schema’s parent catalog
  • To specify an optional managed storage location for the tables and volumes in the schema, an external location must be defined in Unity Catalog, and you must have the CREATE MANAGED STORAGE privilege on the external location.

Create a schema

To create a schema in Unity Catalog, you can use Catalog Explorer or SQL commands.

To create a schema in Hive metastore, you must use SQL commands.

Catalog Explorer

  • Log in to a workspace that is linked to the Unity Catalog metastore.
  • Click Catalog.
  • In the Catalog pane on the left, click the catalog you want to create the schema in.
  • In the detail pane, click Create schema.
  • Give the schema a name and add any comment that would help users understand the purpose of the schema.
  • (Optional) Specify a managed storage location. Requires the CREATE MANAGED STORAGE privilege on the target external location. See Specify a managed storage location in Unity Catalog and Managed locations for schemas.
  • Click Create.
  • Grant privileges on the schema. See Manage privileges in Unity Catalog.
  • Click Save.

SQL


CREATE { DATABASE | SCHEMA } [ IF NOT EXISTS ] <catalog-name>.<schema-name>
    [ MANAGED LOCATION '<location-path>' | LOCATION '<location-path>']
    [ COMMENT <comment> ]
    [ WITH DBPROPERTIES ( <property-key = property_value [ , ... ]> ) ];

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

Appendix:

MS: What are catalogs in Azure Databricks?

MS: What are schemas in Azure Databricks?

Unity Catalog: Create Metastore and Enabling Unity Catalog in Azure

A metastore is the top-level container for data in Unity Catalog. Unity Catalog metastore register metadata about securable objects (such as tables, volumes, external locations, and shares) and the permissions that govern access to them.

Each metastore exposes a three-level namespace (catalog.schema.table) by which data can be organized. You must have one metastore for each region in which your organization operates. 

Microsoft said that Databricks began to enable new workspaces for Unity Catalog automatically on November 9, 2023, with a rollout proceeding gradually across accounts. Otherwise, we must follow the instructions in this article to create a metastore in your workspace region.

Preconditions

Before we begin

1. Microsoft Entra ID Global Administrator

The first Azure Databricks account admin must be a Microsoft Entra ID Global Administrator 

The first Azure Databricks account admin must be a Microsoft Entra ID Global Administrator at the time that they first log in to the Azure Databricks account console.

https://accounts.azuredatabricks.net

Upon first login, that user becomes an Azure Databricks account admin and no longer needs the Microsoft Entra ID Global Administrator role to access the Azure.

2. Premium Tire

Databricks workspaces Pricing tire must be Premium Tire.

3. The same region

Databricks region is in the same as ADLS’s region. Each region allows one metastore only.

Manual create metastore and enable unity catalog process

  1. Create an ADLS G2  (if you did not have)
    Create storage account and container to store manage table and volume data at the metastore level, the container will be the root storage for the unity catalog metastore
  2. Create an Access Connector for Azure Databricks
  3. Grant “Storage Blob Data Contributor” role to access Connector for Azure Databricks on ADLS G2 storage Account
  4. Enable Unity Catalog by creating Metastore and assigning to workspace

Step by step Demo

1. Check Entra ID role.

To check whether I am a Microsoft Entra ID Global Administrator role.

Azure Portal > Entra ID > Role and administrators

I am a Global Administrator

2. Create a container for saving metastore

Create a container at ROOT of ADLS Gen2

Since we have created an ADLS Gen2, directly move to create a container at root of ADLS.

3. Create an Access Connector for Databricks

If it did not automatically create while you create Azure databricks service, manual create one.

Azure portal > Access Connector for Databricks

once all required fields filled, we can see a new access connector created.

4. Grant Storage Blob Data Contributor to access Connector

Add “storage Blob data contributor” role assign to “access connector for Azure Databricks” I just created.

Azure Portal > ADLS Storage account > Access Control (IAM) > add role

Continue to add role assignment

5. Create a metastore

If you are an account admin, you can login accounts console, otherwise, ask your account admin to help.

before you begin to create a metastore, make sure

  • You must be an Azure Databricks account admin.
    The first Azure Databricks account admin must be a Microsoft Entra ID Global Administrator at the time that they first log in to the Azure Databricks account console. Upon first login, that user becomes an Azure Databricks account admin and no longer needs the Microsoft Entra ID Global Administrator role to access the Azure Databricks account. The first account admin can assign users in the Microsoft Entra ID tenant as additional account admins (who can themselves assign more account admins). Additional account admins do not require specific roles in Microsoft Entra ID.
  • The workspaces that you attach to the metastore must be on the Azure Databricks Premium plan.
  • If you want to set up metastore-level root storage, you must have permission to create the following in your Azure tenant

Login to azure databricks console 

azure databricks console: https://accounts.azuredatabricks.net/

Azure Databricks account console > Catalog > Create metastore.

  • Select the same region for your metastore.
    You will only be able to assign workspaces in this region to this metastore.
  • Container name and path
    The pattern is:
    <contain_name>@<storage_account_name>.dfs.core.windows.net/<path>
    For this demo I used this
    mainri-databricks-unitycatalog-metastore-eastus2@asamainriadls.dfs.core.windows.net/
  • Access connector ID
    The pattern is:
    /subscriptions/{sub-id}/resourceGroups/{rg-name}/providers/Microsoft.databricks/accessconnects/<connector-name>

Find out the Access connector ID

Azure portal > Access Connector for Azure Databricks

For this demo I used this
/subscriptions/9348XXXXXXXXXXX6108d/resourceGroups/mainri/providers/Microsoft.Databricks/accessConnectors/unity-catalog-access-connector-Premiu

Looks like this

Enable Unity catalog

Assign to workspace

To enable an Azure Databricks workspace for Unity Catalog, you assign the workspace to a Unity Catalog metastore using the account console:

  1. As an account admin, log in to the account console.
  2. Click Catalog.
  3. Click the metastore name.
  4. Click the Workspaces tab.
  5. Click Assign to workspace.
  6. Select one or more workspaces. You can type part of the workspace name to filter the list.
  7. Scroll to the bottom of the dialog, and click Assign.
  8. On the confirmation dialog, click Enable.

Account console > Catalog > select the metastore >

Workspace tag > Assign to workspace

click assign

Validation the unity catalog enabled

Open workspace, we can see the metastore has been assigned to workspace.

Now, we have successfully created metastore and enabled unity catalog.

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

Unity Catalog in Databricks

Unity Catalog is a fine-grained data governance solution for data present in a Data Lake for managing data governance, access control, and centralizing metadata across multiple workspaces. Unity Catalog provides centralized access control, auditing, lineage, and data discovery capabilities across Azure Databricks workspaces. It brings a new layer of data management and security to your Databricks environment

Unity Catalog provides centralized access control, auditing, lineage, and data discovery capabilities across Azure Databricks workspaces.

Key features of Unity Catalog include

  • Define once, secure everywhere: Unity Catalog offers a single place to administer data access policies that apply across all workspaces.
  • Standards-compliant security model: Unity Catalog’s security model is based on standard ANSI SQL and allows administrators to grant permissions in their existing data lake using familiar syntax, at the level of catalogs, schemas (also called databases), tables, and views.
  • Built-in auditing and lineage: Unity Catalog automatically captures user-level audit logs that record access to your data. Unity Catalog also captures lineage data that tracks how data assets are created and used across all languages.
  • Data discovery: Unity Catalog lets you tag and document data assets, and provides a search interface to help data consumers find data.
  • System tables (Public Preview): Unity Catalog lets you easily access and query your account’s operational data, including audit logs, billable usage, and lineage.

Unity Catalog object model

The hierarchy of database objects in any Unity Catalog metastore is divided into three levels, represented as a three-level namespace (catalog.schema.table-etc) 

Metastore

The metastore is the top-level container for metadata in Unity Catalog. It registers metadata about data and AI assets and the permissions that govern access to them. For a workspace to use Unity Catalog, it must have a Unity Catalog metastore attached.

Object hierarchy in the metastore

In a Unity Catalog metastore, the three-level database object hierarchy consists of catalogs that contain schemas, which in turn contain data and AI objects, like tables and models.

Level one: Catalogs are used to organize your data assets and are typically used as the top level in your data isolation scheme.

Level two: Schemas (also known as databases) contain tables, views, volumes, AI models, and functions.

Level three: Volumes, Tables, Views, Functions, Models (AI models packaged with MLflow)

Working with database objects in Unity Catalog

Working with database objects in Unity Catalog is very similar to working with database objects that are registered in a Hive metastore, with the exception that a Hive metastore doesn’t include catalogs in the object namespace. You can use familiar ANSI syntax to create database objects, manage database objects, manage permissions, and work with data in Unity Catalog. You can also create database objects, manage database objects, and manage permissions on database objects using the Catalog Explorer UI.

Granting and revoking access to database objects

You can grant and revoke access to securable objects at any level in the hierarchy, including the metastore itself. Access to an object implicitly grants the same access to all children of that object, unless access is revoked.


GRANT CREATE TABLE ON SCHEMA mycatalog.myschema TO `finance-team`;

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

Configuring Azure Entra ID Authentication in Azure SQL Database

Azure SQL Database can be integrated with Azure Entra ID to provide identity and access management. With this integration, users can sign in to Azure SQL Database using their Azure Entra ID credentials, enabling a centralized and secure way to manage database access.

Register the SQL Server in Azure Entra ID

Enable Azure Entra ID Admin

Register your SQL Server (or SQL Database) as an application in Azure Entra ID.

Azure Portal > find out the SQL Server that you want to register with Azure Entra ID >

Settings > Microsoft Entra ID (Active Directory Admin)

Assign Users/Groups

You can assign Azure Entra ID users or groups to specific roles within the SQL Database, such as db_owner, db_datareader, or db_datawriter.

Then, Click Save to apply the changes.

Configure Azure Entra ID Authentication in Azure SQL Database

Connect to SQL Database using Azure Entra ID

You can connect to your Azure SQL Database using Azure Entra ID by selecting the “Azure Active Directory – Universal with MFA support” authentication method in tools like SQL Server Management Studio (SSMS).

Assign Roles to Azure Entra ID Users

Use a SQL query to assign roles to Azure Entra ID users or groups. For example:

CREATE USER [your_username@yourdomain.com] FROM EXTERNAL PROVIDER;
ALTER ROLE db_datareader ADD MEMBER [your_username@yourdomain.com];

This command creates an Azure Entra ID user in your SQL Database and adds them to the db_datareader role.

Set Up Role-Based Access Control (RBAC)

You can manage permissions through Azure Entra ID roles and assign these roles to your SQL Database resources.

Assign Roles via Azure Portal

Azure portal > your SQL Database > Access control (IAM) > Add role assignment.

Choose the appropriate role, such as “SQL DB Contributor“.

and assign it to the desired Azure Entra ID user or group

Considerations

  • No Password Management: Since authentication is managed via Azure Entra ID, there’s no need to manage passwords directly within the database.
  • Integration with Conditional Access: This allows you to enforce compliance requirements, such as requiring MFA or ensuring connections only come from specific locations.

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

Using Key Vault services in Azure Ecosystem

Azure Key Vault safeguards encryption keys and secrets like certificates, connection strings, and passwords. 

Key vaults define security boundaries for stored secrets. It allows you to securely store service or application credentials like passwords and access keys as secrets. All secrets in your key vault are encrypted with a software key. When you use Key Vault, you no longer need to store security information in your applications. Not having to store security information in applications eliminates the need to make this information part of the code.

What is a secret in Key Vault?

In Key Vault, a secret is a name-value pair of strings. Secret names must be 1-127 characters long, contain only alphanumeric characters and dashes, and must be unique within a vault. A secret value can be any UTF-8 string up to 25 KB in size.

Vault authentication and permissions

Developers usually only need Get and List permissions to a development-environment vault. Some engineers need full permissions to change and add secrets, when necessary.

For apps, often only Get permissions are required. Some apps might require List depending on the way the app is implemented. The app in this module’s exercise requires the List permission because of the technique it uses to read secrets from the vault.

In this article, we will focus on 2 sections, set up secrets in Key Vault and application retrieves secrets that ware saved in Key vault.

Create a Key Vault and store secrets

Creating a vault requires no initial configuration. You can start adding secrets immediately. After you have a vault, you can add and manage secrets from any Azure administrative interface, including the Azure portal, the Azure CLI, and Azure PowerShell. When you set up your application to use the vault, you need to assign the correct permissions to it

Create a Key Vault service

To create Azure Key Vault service, you can follow the steps.

From Azure Portal, search “key Vault”

click “key Vault”

Fill in all properties

Click review + create. That’s all. Quite simple, right?

Create secrets and save in Key Vault

There are two ways to create secret and save in Key vault.

  • Access control, Identity and Access management (IAM)
  • Access Policies

Using Access Control (IAM) create a secret

From Key Vault> Access Control (IAM) > Add role Assignment

In most cases, if you create and save secrets in key-vault for your users to use, you only need add the “Key vault secrets user” role assignment.

click “next” select a member or group

Pay attention to here, if your organization has multiple instances of the same services, for example, different teams are independently using different ADF instants, make sure you correctly, accurately add the right service instant to access policies.

Once it’s down, check the access.

Create a Secret

From “Key Vault” > “Object” > “Secrets” > “+ Generate/Import”

Fill in all properties, :Create”

Secrets key and value created That’s all.

If you want to update the secret, simply click the key, follow the UI guide, you will not miss it.

Click the “version” that you want to update. Update the content > apply it.

That’s all.

Using Access Policies create a secret

There is another way “Access Policies” to create a secret.

Select the permissions you want under Key permissionsSecret permissions, and Certificate permissions.

If you create a key secret for users to use in their application or other azure services, usually you give “get” and “list” in  the “Secret permissions” enough. Otherwise, check Microsoft official documentation.

Under the Principal selection pane, enter the name of the user, app or service principal in the search field and select the appropriate result.

Using Azure SQL Database as an example

Caution: when you add principal, make sure you select right service instant. Especially you act as a infrastructure administer, your organization has multiple teams that they are independently using different service instants, e.g. different Synapse Workspace. select correct instant. I have been asked to help trouble shotting this scenario a few time. Azure admin says, he has added policies to key-vault, but the use cannot access there yet. that is a funny mistake made, he has added ADF to kay-vault policies, unfortunately, the ADF is NOT team A used, team B is using it. 🙂

Back on the Access policies page, verify that your access policy is listed.

Create secret key and value

We have discussed it above. Need not verbose.

Done!

Using secrets that were saved in Key Vault

Using secrets usually have 2 major scenarios, directly use, or use REST API call to retrieve the saved secret value.

Let’s use Data Factory as an example to discuss.

Scenario 1, directly use it

For example, when you create linked service to connect Azure Sql Database

 You have to make sure that Key Vault’s access policies has this ADF access policies, get and list

one more example, System workspaces use key-vault.

Once again, make sure your Synapse Workspace has access policies, “Key Vault Secrets User“, “get” and “List”

Scenario 2, REST API call Key Vault to use secret

Many engineers want to call the key Vault to retrieve the secret value for a certain purpose, e.g.  Synapse pipeline to get SharePoint Online list or files that resident in SharePoint Library, you need an application secret value to build the query string. Normally, the application’s secret value is saved in Key Vault. In this case, you have to make a http call to Key value.

Get a specified secret from a given key vault.
The GET operation is applicable to any secret stored in Azure Key Vault. This operation requires the secrets/get permission.

GET {vaultBaseUrl}/secrets/{secret-name}/{secret-version}?api-version=7.4

for this example,

{vaultBaseUrl} https://mainri-key-vault.vault.azure.net

{secret-name} mySecretName

{secret-version} can skipped, if you skipped it, it will use “Latest Version”

Look at the pipeline

This web activity calls Azure Key Vault to retrieve secret value.

URL: https://mainri-key-vault.vault.azure.net/secrets/mySecretName?api-version=7.4

Method: Get

Authentication: Get

Resource: https://vault.azure.net

Caution, ?api-version=7.4 is very important, you have to add it to the URL.

all done.

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

Appendix:

MS document Get Secret – Get Secret

Day 10: Workflows in Azure Purview

Workflow in Purview

In Azure Purview, a workflow refers to a set of predefined, automated tasks that streamline and manage certain processes, such as approving data access requests or managing metadata curation.

Workflows in Azure Purview help automate data governance operations and improve collaboration across teams. They can include tasks like approving the registration of new data sources, assigning data stewards, or managing access to certain datasets.

Key Features of Workflows in Azure Purview:

  1. Data Curation and Approval Processes
    Workflows can be used to automate the process of curating and approving metadata changes or lineage extraction requests.
  2. Automated Notifications
    Users can receive notifications when tasks like data registration or classification have been completed or need approval.
  3. Customizable Steps
    Workflows are customizable, allowing we to define multiple stages of approval or validation depending on the organization’s governance policies.
  4. Access Management
    We can define workflows to manage how access requests are handled for sensitive data.
  5. Approval Requests
    For example, if someone registers a new data source, the workflow might require approval from a Purview Collection Administrator before proceeding.

Next step: Day 11 –

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

Day 11: Purview Access Control and Management

This article describes collections and access management for your account in the Microsoft Purview governance portal.

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

Next Step: Day 11:

Day 9: Managed attributes in Data Map

With “Managed Attributes” we can add own attributes (groups of attributes) and provide data stewards with the functionality to improve the content of data catalog.

  • create a new attribute group
  • create a new attribute
  • learn more about the available field types
  • assign and manage attributes for your data assets

Create a new attribute group

Create attribute group if there is no attribute group

Purview studio > Data Map > Managed attributes > New attribute group

Fill in

Create a new attribute

File in those fields
For field group: There are those can be selected

For applicable asset types, many options out of box to be used

Now, new attributes created

In the managed attribute management experience, managed attributes can’t be deleted, only expired. Expired attributes can’t be applied to any assets and are, by default, hidden in the user experience. Once an attribute created, it cannot change. Only mark them as “expired” and create a new, undated one.

Add value for managed attribute

Once a managed attribute has been created, you’ll need to add a value for each of your assets. You can add values to your assets by:

  1. Search for your data asset in the Microsoft Purview Data Catalog
  2. On the overview for your asset, you should see the managed attributes section with all attributes that have values. (You can see attributes without values by using the Show attributes without a value toggle.)
  3. Select the Edit button.

Under Managed attributes, add values for each of your attributes.

If any attributes are Required you will not be able to save until you’ve added a value for that attribute.

Now, managed attribute added

Summary

Managed attribute: A set of user-defined attributes that provide a business or organization level context to an asset. A managed attribute has a name and a value. For example, ‘Department’ is an attribute name and ‘Finance’ is its value. Attribute group: A grouping of managed attributes that allow for easier organization and consumption.

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

Next step: Day 10 – Collections access control and management

Day 8 – Data Lineage, Extract SQL, ADF and Synapse Pipeline Lineage

Microsoft Purview provides an overview of data lineage in the Data Catalog. It also details how data systems can integrate with the catalog to capture lineage of data.

Lineage is represented visually to show data moving from source to destination including how the data was transformed. Given the complexity of most enterprise data environments.

Microsoft Purview supports lineage for views and stored procedures from Azure SQL Database. While lineage for views is supported as part of scanning, you will need to turn on the Lineage extraction toggle to extract stored procedure lineage when you’re setting up a scan.

Lineage collection

Metadata collected in Microsoft Purview from enterprise data systems are stitched across to show an end to end data lineage. Data systems that collect lineage into Microsoft Purview are broadly categorized into following three types:

  • Data processing systems
  • Data storage systems
  • Data analytics and reporting systems

Each system supports a different level of lineage scope.  

Data estate might include systems doing data extraction, transformation (ETL/ELT systems), analytics, and visualization systems. Each of the systems captures rich static and operational metadata that describes the state and quality of the data within the systems boundary. The goal of lineage in a data catalog is to extract the movement, transformation, and operational metadata from each data system at the lowest grain possible.

The following example is a typical use case of data moving across multiple systems, where the Data Catalog would connect to each of the systems for lineage.

  • Data Factory copies data from on-prem/raw zone to a landing zone in the cloud.
  • Data processing systems like Synapse, Databricks would process and transform data from landing zone to Curated zone using notebooks.
  • Further processing of data into analytical models for optimal query performance and aggregation.
  • Data visualization systems will consume the datasets and process through their meta model to create a BI Dashboard, ML experiments and so on.

Lineage for SQL DB views

Starting 6/30/24, SQL DB metadata scan will include lineage extraction for views. Only new scans will include the view lineage extraction. Lineage is extracted at all scan levels (L1/L2/L3). In case of an incremental scan, whatever metadata is scanned as part of incremental scan, the corresponding static lineage for tables/views will be extracted.

Prerequisites for setting up a scan with Stored Procedure lineage extraction

<Purview-Account> can access SQL Database and in db_owner group

To check whether the Account Exists in the Database


SELECT name, type_desc
FROM sys.database_principals
WHERE name = 'YourUserName';

Replace ‘YourUserName’ with the actual username you’re checking for.

If the user exists, it will return the name and type (e.g., SQL_USER or WINDOWS_USER).

If it does not exist, create one.

Sign in to Azure SQL Database with your Microsoft Entra account, create a <Purview-account> account and assign db_owner permissions to the Microsoft Purview managed identity.

You can review my previous article Configuring Azure Entra ID Authentication in Azure SQL Database If you are not sure how to enable Azure Entra ID login.


Create user <purview-account> FROM EXTERNAL PROVIDER
GO
EXEC sp_addrolemember 'db_owner', <purview-account> 
GO

replace <purview-account> with the actual purview account name.

Master Key

Check whether master exists or not.

To check if the Database Master Key (DMK) exists or not


SELECT * FROM sys.symmetric_keys
WHERE name = '##MS_DatabaseMasterKey##';Create master key
Go

if the query returns a result, it means the Database Master Key already exists.

If no rows are returned, it means the Database Master Key does not exist, and you may need to create one if required for encryption-related operations.

Create a master key


Create master key
Go

Allow Azure services and resources to access this server 

Ensure that Allow Azure services and resources to access this server is enabled under networking/firewall for your Azure SQL resource.

Previously, we have discussed create a scan for Azure SQL Database at Registering Azure SQL Database and Scan in Purview, that scan progress is disabled “Lineage extraction” in that article.

To allow purview extract lineage, we need set to on

Extract Azure Data Factory/Synapse pipeline lineage

When we connect an Azure Data Factory to Microsoft Purview, whenever a supported Azure Data Factory activity is run, metadata about the activity’s source data, output data, and the activity will be automatically ingested into the Microsoft Purview Data Map.

Microsoft Purview captures runtime lineage from the following Azure Data Factory activities:

  • Copy Data
  • Data Flow
  • Execute SSIS Package

If a data source has already been scanned and exists in the data map, the ingestion process will add the lineage information from Azure Data Factory to that existing source. If the source or output doesn’t exist in the data map and is supported by Azure Data Factory lineage Microsoft Purview will automatically add their metadata from Azure Data Factory into the data map under the root collection.

This can be an excellent way to monitor your data estate as users move and transform information using Azure Data Factory.

Connect to Microsoft Purview account in Data Factory

Set up authentication

Data factory’s managed identity is used to authenticate lineage push operations from data factory to Microsoft Purview. Grant the data factory’s managed identity Data Curator role on Microsoft Purview root collection.

Purview > Management > Lineage connections > Data Factory > new

Validation: Purview > Data map > Collection > Root collection > Role assignments >

Check, the ADF is under “data Curators” section. That’s OK

ADF connect to purview

In the ADF studio: Manage -> Microsoft Purview, and select Connect to a Microsoft Purview account

We will see this

Once pipeline successfully runs, activity will be caught, extracted lineage look this.

that’s all for extracting ADF pipeline lineage.

Next step: Day 9 – Managed attributes in Data Map

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)