Azure Purview is a unified data governance service provided by Microsoft. It helps organizations manage and govern their on-premises, multi-cloud, and software as a service (SaaS) data. The primary purpose of Azure Purview is to provide a comprehensive understanding of an organization’s data landscape through data discovery, classification, and lineage tracking.
Before you can develop data-governance plans for usage and storage, you need to understand the data your organization uses.
Without the ability to track data from end to end, you must spend time tracing problems created by data pipelines that other teams own. If you make changes to your datasets, you can accidentally affect related reports that are business or mission critical.
Microsoft Purview is designed to address these issues and help enterprises get the most value from their existing information assets. Its catalog makes data sources easy to discover and understand by the users who manage the data.
Key Features of Azure Purview:
- Data Cataloging: Automatically discover data assets across your data estate and register them in a unified catalog.
- Data Lineage: Track the lineage of data to understand how it flows through different systems.
- Data Classification: Apply built-in and custom classifiers to categorize your data based on sensitivity and type.
- Business Glossary: Create and manage a business glossary to standardize terms and definitions across your organization.
- Data Insights: Gain insights into the distribution of your data, data owners, and data usage patterns.
- Integration with Azure Data Services: Integrate with other Azure services like Azure Synapse Analytics, Power BI, and more for seamless data governance.
Microsoft Purview has three main elements:
Data Map:
The data map provides a structure for your data estate in Microsoft Purview, where you can map your existing data stores into groups and hierarchies.
Data Catalog
The data catalog allows your users to browse the metadata stored in the data map so that they can find reliable data and understand its context.
Users can see where the data comes from and who are the experts they can contact about that data source.
The data catalog also integrates with other Azure products, like the Azure Synapse Analytics workspace, so that users can search for the data they need from the applications they need it in.
Catalog browse by Azure Subscriptions example:
Catalog browse by Azure Data Lake example
Catalog browser by Blob Storage:
Catalog browser by SQL Server:
Data Estate Insights
Insights offer a high-level view into your data catalog, covering these key facets:
- Data stewardship: A report on how curated your data assets are so that you can track your governance progress.
- Catalog adoption: A report on the number of active users in your data catalog, their top searches, and your most viewed assets.
- Asset insights: A report on the data estate and source-type distribution. You can view by source type, classification, and file size. View the insights as a graph or as key performance indicators.
- Scan insights: A report that provides information on the health of your scans (successes, failures, or canceled).
- Glossary insights: A status report on the glossary to help users understand the distribution of glossary terms by status, and view how the terms are attached to assets.
- Classification insights: A report that shows where classified data is located. It allows security administrators to understand the types of information found in their organization’s data estate.
- Sensitivity insights: A report that focuses on sensitivity labels found during scans. Security administrators can make use of this information to ensure security is appropriate for the data estate.
Search the Microsoft Purview data catalog
From Purview Studio home, we can type keys to search
We can filter the search from left hand section
Understand a single asset
Asset overview
Select an asset to see the overview. The overview displays information at a glance, including a description, asset classification, schema classification, collection path, asset hierarchy, and glossary terms.
Properties:
Schema
The schema view of the asset includes more granular details about the asset, such as column names, data types, column level classifications, terms, and descriptions.
Lineage
Asset lineage gives you a clear view of how the asset is populated and where data comes from. Data lineage is broadly understood as the lifecycle that spans the data’s origin, and where it moves over time across the data estate. Data lineage is important to analysts because it enables understanding of where data is coming from, what upstream changes may have occurred, and how it flows through the enterprise data systems.
contacts
contacts provide you contact details of experts or dataset owners with any questions.
Related
We will discuss above in the coming articles.
Next step: Day 3: How Microsoft Purview works – Data Source, Rule Sets, and Classification
Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca
(remove all space from the email account 😊)