Locking Mechanisms in Relational Database Management Systems (RDBMS)

In relational databases, locks are essential mechanisms for managing concurrent access to data. They prevent data corruption and ensure data consistency when multiple transactions try to read or modify the same data simultaneously.

Without locks, concurrent transactions could lead to several problems. For example,

  • Dirty Reads, a transaction may read data that has been modified by another transaction but not yet committed;
  • Lost updates, one transaction’s updates may be overwritten by another transaction;
  • Non-Repeatable Reads, A transaction reads the same data multiple times, and due to updates by other transactions, the results of each read may be different;
  • Phantom Reads: A transaction executes the same query multiple times, and due to insertions or deletions by other transactions, the result set of each query may be different.

Here’s a detailed breakdown of locks in relational databases.

Types of Locks

Relational databases use various types of locks with different levels of restriction:

Shared Lock

Allows multiple read operations simultaneously. Prevents write operations until the lock is released.

Example: SELECT statements in many databases.

Exclusive Lock

Allows a single transaction to modify data. Prevents other operations (read or write) until the lock is released.

Example: UPDATE, DELETE.

Update Lock

Prevents deadlocks when a transaction might upgrade a shared lock to an exclusive lock.

Intent Lock

Indicate the type of lock a transaction intends to acquire. Intent Shared (IS): Intends to acquire a shared lock on a lower granularity level. Intent Exclusive (IX): Intends to acquire an exclusive lock on a lower granularity level.

Lock Granularity

Locks can be applied at different levels of granularity.

Row-Level Lock

Locks a specific row in a table. Provide the highest concurrency, but if many rows are locked, it may lead to lock management overhead.
Example: Updating a specific record (UPDATE ... WHERE id = 1).

Page-Level Lock

Locks a data page, a block of rows. Provide a compromise between concurrency and overhead.

(a page is a fixed-size storage unit)

Table-Level Lock

Locks an entire table. Provide the lowest concurrency but minimal overhead.

Example: Prevents any modifications to the table during an operation like ALTER TABLE.

Lock Duration

Transaction Locks: Held until the transaction is committed or rolled back.

Session Locks: Held for the duration of a session.

Temporary Locks: Released immediately after the operation completes.

Deadlocks Prevention and Handling

A deadlock occurs when two or more transactions are waiting for each other to release locks. Databases employ deadlock detection and resolution mechanisms to handle such situations.

Prevent Deadlocks

Avoid Mutual Exclusion
Use resources that allow shared access (e.g., shared locks for read-only operations).

ย Eliminate Hold and Wait
Require transactions to request all resources they need at the beginning. If any resource is unavailable, the transaction must wait without holding any resources.

Allow Preemption
If a transaction requests a resource that is held by another, the system can preempt (forcefully release) the resource from the holding transaction. The preempted transaction is rolled back and restarted.

Break Circular Wait
Impose a global ordering on resources and require transactions to request resources in that order. For example, if resources are ordered as R1, R2, R3, a transaction must request R1 before R2, and R2 before R3.

    Handle Deadlocks

    If deadlocks cannot be prevented, the database system must detect and resolve them. Here’s how deadlocks are typically handled:

    Deadlock Detection
    The database system periodically checks for deadlocks by analyzing the wait-for graph, which represents transactions and their dependencies on resources. If a cycle is detected in the graph, a deadlock exists.

    Deadlock Resolution
    Once a deadlock is detected, the system must resolve it by choosing a victim transaction to abort. The victim is typically selected based on criteria such as:

    • Transaction Age: Abort the newest or oldest transaction.
    • Transaction Progress: Abort the transaction that has done the least work.
    • Priority: Abort the transaction with the lowest priority.

    The aborted transaction is rolled back, releasing its locks and allowing other transactions to proceed.

    Conclusion

    Locks are crucial for ensuring data consistency and integrity in relational databases. Understanding the different types of locks, lock granularity, locking protocols, and isolation levels is essential for database developers and administrators to design and manage concurrent applications effectively.

    Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

    (remove all space from the email account ๐Ÿ˜Š)

    Overview of Commonly Used Unity Catalog and Spark SQL Management Commands

    Summary of frequently use Unity Catalog and Spark SQL management commands, organized in a table.

    CategoryCommandDescriptionExample
    Catalog ManagementSHOW CATALOGSLists all available catalogs.SHOW CATALOGS;
    Schema ManagementSHOW SCHEMAS IN <catalog_name>Lists schemas (databases) within a catalog.SHOW SCHEMAS IN main;
    DESCRIBE SCHEMA <catalog_name>.<schema_name>Provides metadata about a specific schema.DESCRIBE SCHEMA main.default;
    Table ManagementSHOW TABLES IN <catalog_name>.<schema_name>Lists all tables in a schema.SHOW TABLES IN main.default;
    DESCRIBE TABLE <catalog_name>.<schema_name>.<table_name>Displays metadata about a specific table.DESCRIBE TABLE main.default.sales_data;
    SHOW PARTITIONS <catalog_name>.<schema_name>.<table_name>Lists partitions of a partitioned table.SHOW PARTITIONS main.default.sales_data;
    SHOW COLUMNS IN <catalog_name>.<schema_name>.<table_name>Lists all columns of a table, including their data types.SHOW COLUMNS IN main.default.sales_data;
    DROP TABLE <catalog_name>.<schema_name>.<table_name>Deletes a table from the catalog.DROP TABLE main.default.sales_data;
    Database ManagementSHOW DATABASESLists all databases (schemas) in the environment.SHOW DATABASES;
    DESCRIBE DATABASE <database_name>Provides metadata about a specific database.DESCRIBE DATABASE default;
    Data QueryingSELECT * FROM <catalog_name>.<schema_name>.<table_name>Queries data from a table.SELECT * FROM main.default.sales_data WHERE region = 'West';
    Table CreationCREATE TABLE <catalog_name>.<schema_name>.<table_name> (<columns>)Creates a managed table in Unity Catalog.CREATE TABLE main.default.sales_data (id INT, region STRING, amount DOUBLE);

    Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

    (remove all space from the email account ๐Ÿ˜Š)

    Day 9: Managed attributes in Data Map

    With “Managed Attributes” we can add own attributes (groups of attributes) and provide data stewards with the functionality to improve the content of data catalog.

    • create a new attribute group
    • create a new attribute
    • learn more about the available field types
    • assign and manage attributes for your data assets

    Create a new attribute group

    Create attribute group if there is no attribute group

    Purview studio > Data Map > Managed attributes > New attribute group

    Fill in

    Create a new attribute

    File in those fields
    For field group: There are those can be selected

    For applicable asset types, many options out of box to be used

    Now, new attributes created

    In the managed attribute management experience, managed attributes can’t be deleted, only expired. Expired attributes can’t be applied to any assets and are, by default, hidden in the user experience. Once an attribute created, it cannot change. Only mark them as โ€œexpiredโ€ and create a new, undated one.

    Add value for managed attribute

    Once a managed attribute has been created, you’ll need to add a value for each of your assets. You can add values to your assets by:

    1. Search for your data asset in the Microsoft Purview Data Catalog
    2. On the overview for your asset, you should see the managed attributes section with all attributes that have values. (You can see attributes without values by using the Show attributes without a value toggle.)
    3. Select the Edit button.

    Under Managed attributes, add values for each of your attributes.

    If any attributes are Required you will not be able to save until you’ve added a value for that attribute.

    Now, managed attribute added

    Summary

    Managed attribute: A set of user-defined attributes that provide a business or organization level context to an asset. A managed attribute has a name and a value. For example, ‘Department’ is an attribute name and ‘Finance’ is its value. Attribute group: A grouping of managed attributes that allow for easier organization and consumption.

    Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

    (remove all space from the email account ๐Ÿ˜Š)

    Next step: Day 10 – Collections access control and management

    Day 8 – Data Lineage, Extractย SQL, ADF and Synapse Pipeline Lineage

    Microsoft Purview provides an overview of data lineage in the Data Catalog. It also details how data systems can integrate with the catalog to capture lineage of data.

    Lineage is represented visually to show data moving from source to destination including how the data was transformed. Given the complexity of most enterprise data environments.

    Microsoft Purview supports lineage for views and stored procedures from Azure SQL Database. While lineage for views is supported as part of scanning, you will need to turn on the Lineage extraction toggle to extract stored procedure lineage when you’re setting up a scan.

    Lineage collection

    Metadata collected in Microsoft Purview from enterprise data systems are stitched across to show an end to end data lineage. Data systems that collect lineage into Microsoft Purview are broadly categorized into following three types:

    • Data processing systems
    • Data storage systems
    • Data analytics and reporting systems

    Each system supports a different level of lineage scope.  

    Data estate might include systems doing data extraction, transformation (ETL/ELT systems), analytics, and visualization systems. Each of the systems captures rich static and operational metadata that describes the state and quality of the data within the systems boundary. The goal of lineage in a data catalog is to extract the movement, transformation, and operational metadata from each data system at the lowest grain possible.

    The following example is a typical use case of data moving across multiple systems, where the Data Catalog would connect to each of the systems for lineage.

    • Data Factory copies data from on-prem/raw zone to a landing zone in the cloud.
    • Data processing systems like Synapse, Databricks would process and transform data from landing zone to Curated zone using notebooks.
    • Further processing of data into analytical models for optimal query performance and aggregation.
    • Data visualization systems will consume the datasets and process through their meta model to create a BI Dashboard, ML experiments and so on.

    Lineage for SQL DB views

    Starting 6/30/24, SQL DB metadata scan will include lineage extraction for views. Only new scans will include the view lineage extraction. Lineage is extracted at all scan levels (L1/L2/L3). In case of an incremental scan, whatever metadata is scanned as part of incremental scan, the corresponding static lineage for tables/views will be extracted.

    Prerequisites for setting up a scan with Stored Procedure lineage extraction

    <Purview-Account> can access SQL Database and in db_owner group

    To check whether the Account Exists in the Database

    
    SELECT name, type_desc
    FROM sys.database_principals
    WHERE name = 'YourUserName';
    

    Replace ‘YourUserName’ with the actual username you’re checking for.

    If the user exists, it will return the name and type (e.g., SQL_USER or WINDOWS_USER).

    If it does not exist, create one.

    Sign in to Azure SQL Database with your Microsoft Entra account, create a <Purview-account> account and assign db_owner permissions to the Microsoft Purview managed identity.

    You can review my previous article Configuring Azure Entra ID Authentication in Azure SQL Database If you are not sure how to enable Azure Entra ID login.

    
    Create user <purview-account> FROM EXTERNAL PROVIDER
    GO
    EXEC sp_addrolemember 'db_owner', <purview-account> 
    GO
    

    replace <purview-account> with the actual purview account name.

    Master Key

    Check whether master exists or not.

    To check if the Database Master Key (DMK) exists or not

    
    SELECT * FROM sys.symmetric_keys
    WHERE name = '##MS_DatabaseMasterKey##';Create master key
    Go
    

    if the query returns a result, it means the Database Master Key already exists.

    If no rows are returned, it means the Database Master Key does not exist, and you may need to create one if required for encryption-related operations.

    Create a master key

    
    Create master key
    Go
    

    Allow Azure services and resources to access this server 

    Ensure that Allow Azure services and resources to access this server is enabled under networking/firewall for your Azure SQL resource.

    Previously, we have discussed create a scan for Azure SQL Database at Registering Azure SQL Database and Scan in Purview, that scan progress is disabled โ€œLineage extractionโ€ in that article.

    To allow purview extract lineage, we need set to on

    Extract Azure Data Factory/Synapse pipeline lineage

    When we connect an Azure Data Factory to Microsoft Purview, whenever a supported Azure Data Factory activity is run, metadata about the activity’s source data, output data, and the activity will be automatically ingested into the Microsoft Purview Data Map.

    Microsoft Purview captures runtime lineage from the following Azure Data Factory activities:

    • Copy Data
    • Data Flow
    • Execute SSIS Package

    If a data source has already been scanned and exists in the data map, the ingestion process will add the lineage information from Azure Data Factory to that existing source. If the source or output doesn’t exist in the data map and is supported by Azure Data Factory lineage Microsoft Purview will automatically add their metadata from Azure Data Factory into the data map under the root collection.

    This can be an excellent way to monitor your data estate as users move and transform information using Azure Data Factory.

    Connect to Microsoft Purview account in Data Factory

    Set up authentication

    Data factory’s managed identity is used to authenticate lineage push operations from data factory to Microsoft Purview. Grant the data factory’s managed identity Data Curator role on Microsoft Purview root collection.

    Purview > Management > Lineage connections > Data Factory > new

    Validation: Purview > Data map > Collection > Root collection > Role assignments >

    Check, the ADF is under โ€œdata Curatorsโ€ section. Thatโ€™s OK

    ADF connect to purview

    In the ADF studio: Manage -> Microsoft Purview, and select Connect to a Microsoft Purview account

    We will see this

    Once pipeline successfully runs, activity will be caught, extracted lineage look this.

    that’s all for extracting ADF pipeline lineage.

    Next step: Day 9 – Managed attributes in Data Map

    Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

    (remove all space from the email account ๐Ÿ˜Š)