Comparison of Fabric, Azure Databricks and Synapse Analytics

Microsoft Fabric vs Databricks vs Synapse

Microsoft Fabric is an all-in-one SaaS analytics platform with integrated BI.
Databricks is a Spark-based platform mainly used for large-scale data engineering and machine learning.
Synapse is an enterprise analytics service combining SQL data warehousing and big data processing.

PlatformDescription (English)
Microsoft FabricAn all-in-one SaaS data platform that integrates data engineering, data science, warehousing, real-time analytics, and BI.
Azure DatabricksA Spark-based analytics and AI platform optimized for large-scale data engineering and machine learning.
Azure Synapse AnalyticsAn analytics service combining data warehousing and big data analytics.

Architecture

  1. Microsoft Fabric: Fully integrated SaaS platform built around OneLake.
    single data lake, unified workspace, built-in Power BI
  2. Databricks: Spark-native architecture optimized for big data processing.
    Delta Lake, Spark clusters, ML workloads
  3. Synapse: Hybrid analytics platform integrating SQL data warehouse and big data tools.

Main Use Cases

PlatformBest For
FabricEnd-to-end analytics platform
DatabricksAdvanced data engineering & ML
SynapseEnterprise data warehouse

Comparation of V-order, Z-order, liquid clustering

In Microsoft Fabric, performance is everything. While V-order, Z-order, and Liquid Clustering all aim to make your queries faster, they work at different levels of the data “stack.”

V-Order

The “Inside” Optimizer.

V-order is a Microsoft-proprietary optimization that happens inside each Parquet file. It doesn’t decide which rows go into which file; instead, it rearranges data within the file to maximize compression and read speed for the VertiPaq engine (used by Power BI Direct Lake).

  • Level: File-level (Internal).
  • When: Applied during write-time (by default in Fabric).

When: Applied during write-time (by default in Fabric).

Z-Order

The “Classic” Co-locator.

Z-order is a standard Delta Lake technique. It rearranges data across files so that related information is stored together. For example, if you frequently filter by CustomerID and Date, Z-order will group those rows into the same files, allowing the engine to skip files that don’t match your filter (Data Skipping).

  • Level: Table-level (External/File Layout).
  • Constraint: If you want to change the columns you Z-order by, you must rewrite the entire table.

Liquid Clustering

The “Future” of Layout

Liquid Clustering is the successor to Z-order and Partitioning. It is incremental and flexible. It uses a tree-based algorithm to cluster data as it arrives. Unlike Z-order, you can change your clustering keys without a full data rewrite.

  • Level: Table-level (Modern Replacement for Z-order).
  • Benefit: Handles data skew automatically and works great for tables that grow rapidly.
FeatureV-OrderZ-OrderLiquid Clustering
FocusCompression inside the fileData skipping across filesDynamic/Flexible data skipping
CompatibilityWorks with Z or LiquidWorks with V-orderWorks with V-order
FlexibilityAutomaticHard (requires full rewrite)High (easy to change keys)
Best ForPower BI / Direct Lake speedStatic query patternsChanging patterns / Rapid growth

Ultra-short memory trick

V-Order optimizes column layout for BI, Z-Order optimizes filtering, and Liquid Clustering adapts data layout over time.

  • V-Order = BI scan
  • Z-Order = WHERE filter
  • Liquid Clustering = Future-proof layout

The Best Setup for 2026

In a modern Fabric project, you should almost never use Z-order anymore. Instead, use the Golden Combo:

  • V-Order (Internal): Keep this enabled to ensure Power BI is lightning fast.
  • Liquid Clustering (External): Use CLUSTER BY (ColumnName) on your large tables to handle file-level efficiency.

Fabric terminology

Learn the definitions of terms used in Microsoft Fabric, including terms specific to Fabric Data Engineering, Data Factory, Fabric Data Science, Fabric Data Warehouse, IQ, Real-Time Intelligence, and Power BI.

General terms

Capacity

Dedicated compute resources for Fabric workloads.

Capacity defines the ability of a resource to perform an activity or to produce output. Different items consume different capacity at a certain time.

Direct Lake

Query Delta tables directly without import.

Experience

An Experience in Fabric refers to a specific workload environment or toolset designed for a particular data task.

Examples of Experiences

  • Microsoft Fabric Data Engineering
  • Microsoft Fabric Data Factory
  • Microsoft Fabric Data Science
  • Microsoft Fabric Data Warehouse
  • Microsoft Fabric Power BI
  • Microsoft Fabric Real-Time Intelligence

Experiment

An Experiment in Fabric is used in data science and machine learning to track multiple model training runs and compare results.

IQ

Ontology (preview) is an item where you can define entity types, relationships, properties, and other constraints to organize data according to your business vocabulary.

Item

An item is an object that actually created. It is a set of capabilities within a workload. Users can create, edit, and delete them. Each item type provides different capabilities. For example, the Data Engineering workload includes the lakehouse, notebook, and Spark job definition items.

ExperienceItem
Data EngineeringLakehouse
Data FactoryPipeline
Data WarehouseWarehouse
Power BIReport
Real-Time AnalyticsEventstream

Lakehouse

Combines data lake storage with data warehouse features.

A lakehouse is a database built over a data lake, containing files, folders, and tables. It is used by the Apache Spark engine and SQL engine for big data processing. Lakehouses support ACID transactions when using the open-source Delta formatted tables. The lakehouse item is hosted within a unique workspace folder in Microsoft OneLake.

Liquid Clustering

Adaptive clustering mechanism for Delta tables

OneLake

A single, unified data lake for the entire Fabric tenant.

Shortcut

Shortcuts are embedded references within OneLake that point to other file store locations. They provide a way to connect to existing data without having to directly copy it.

V-order

V-Order (Vertical Order) is a column-oriented data layout optimization used in Fabric Lakehouse to improve analytic query performance, especially for Power BI Direct Lake.

A write optimization to the parquet file format that enables fast reads and provides cost efficiency and better performance. All the Fabric engines write v-ordered parquet files by default.

Workload / Experience

In Microsoft Fabric, a Workload (also called an Experience) is a functional area of the platform that provides a specific type of analytics capability.

Each workload has:

  • Its own UI
  • Its own tools
  • Its own compute behavior
  • But shares OneLake and security

Workspace

A logical container for Fabric items.

Appendix

Microsoft Fabric terminology

Introducing Fabric

What is Fabic?

Microsoft Fabric is an all-in-one, SaaS (Software as a Service) analytics platform. It combines data movement, data engineering, data science, and business intelligence into one single website. It is built on OneLake, which is like “OneDrive for data.”

Why use Microsoft Fabric?

Traditional data stacks are “fragmented”—you might use Azure Data Factory for moving data, Databricks for cleaning it, Datalake for storing it, and Power BI for seeing it.

Fabric fixes this by putting everything in one place.

  • Unified Data: Every tool (SQL, Spark, Power BI) uses the same copy of data in OneLake. No more duplicating data.
  • SaaS Simplicity: You don’t need to manage servers, clusters, or storage accounts. It’s all managed by Microsoft.
  • Direct Lake Mode: Power BI can read data directly from the lake without “importing” it, making reports incredibly fast.
  • Cost Efficiency: You pay for one pool of “Compute Capacity” and share it across all your teams.

What can Microsoft Fabric do?

It handles the entire data journey:

  • Ingest: Pull data from anywhere (SQL, AWS, Web).
  • Store: Store massive amounts of data in an open format (Delta Parquet).
  • Process: Clean and transform data using Python/Spark or SQL.
  • Analyze: Run complex queries and train Machine Learning models.
  • Visualize: Build real-time dashboards in Power BI.

Components of Fabric

Fabric is divided into “Experiences” based on what you need to do:

Component (Experience)RolePurpose
Data FactoryData EngineerETL, Pipelines, and Dataflows.
Synapse Data EngineeringSpark DeveloperHigh-scale processing using Notebooks.
Synapse Data WarehouseSQL DeveloperProfessional-grade SQL storage.
Synapse Data ScienceData ScientistBuilding AI and ML models.
Real-Time IntelligenceIoT / App DevHigh-speed streaming data.
Power BIBusiness AnalystVisualizing data for the business.
Data ActivatorOperationsAutomatic alerts (e.g., “Email me if sales drop”).

Step-by-Step: Getting Started

If you were a Data Engineer working on a project today, your workflow would look like this:

Step 1: Create a Workspace

  • Log in to Fabric. Create a Workspace.
  • Ensure it has a Fabric Capacity license attached.

Step 2: The Lakehouse (Bronze)

  • Create a Lakehouse called “Company_Data_Lake“.
  • This creates a folder in OneLake where your raw files will live.

Step 3: Data Pipeline (Ingestion

  • Use Data Factory to create a pipeline.
  • >> Data Pipeline | Source: SQL Server | Destination: Lakehouse Files | 03/2026
  • This pulls your raw data into the “Files” folder.

Step 4: Notebook (Silver/Gold)

  • Open a Notebook. Use PySpark to clean the data.
  • Example: Remove duplicates, fix date formats.
  • Save the result as a Table (Delta format).

Step 5: Power BI (Visualization)

  • Switch to your SQL Analytics Endpoint.
  • Click New Report. Power BI will use Direct Lake to show your data instantly.

Appendix

Microsoft Fabric fundamentals documentation

Technical Interview Questions and Answers

The Job Interview is the single most critical step in the job hunting process. It is the definitive arena where all of your abilities must integrate. The interview itself is not a skill you possess; it is the moment you deploy your Integrated Skill Set, blending:

  1. Hard Skills (Technical Mastery): Demonstrating not just knowledge of advanced topics, but the depth of your expertise and how you previously applied it to solve complex, real-world problems.
  2. Soft Skills (Communication & Presence): Clearly articulating strategy, managing complexity under pressure, and exhibiting the leadership presence expected of senior-level and expert-level candidates.
  3. Contextual Skills (Business Acumen): Framing your solutions within the company’s business goals and culture, showing that you understand the strategic impact of your work.

This Integrated Skill represents your first real opportunity to sell your strategic value to the employer.

What is Azure Data Factory?

Azure Data Factory is a cloud-based data integration service used to create data-driven workflows for orchestrating and automating data movement and transformation across different data stores and compute services.

Key capabilities

  • Data ingestion
  • Data orchestration
  • Data transformation
  • Scheduling and monitoring
What are the core components of ADF?

The main components are:

ComponentPurpose
PipelineLogical grouping of activities
ActivitySingle task in a pipeline
DatasetData structure pointing to data
Linked ServiceConnection to external resources
TriggerSchedule or event that starts a pipeline
Integration RuntimeCompute infrastructure used to run activities
What is a Pipeline?

A pipeline is a logical grouping of activities that together perform a task such as data ingestion or transformation.

What is Integration Runtime (IR)?

Integration Runtime is the compute infrastructure used by ADF to perform data integration tasks.

  • Azure IR Fully managed compute in Azure
  • Self-hosted IR Runs on on-premises machines
  • Azure SSIS IR Used to run SSIS packages

Types:

What is a Linked Service?

A linked service defines the connection information needed for ADF to connect to external resources, example:

  • Azure SQL Database
  • Data Lake
  • databricks
  • On-Prem database
What is a Dataset?

A dataset represents the structure of data within a data store.

Difference between Pipeline and Data Flow
FeaturePipelineData Flow
PurposeOrchestrationData transformation
ComputeOrchestration engineSpark cluster
UIActivity workflowVisual transformation
How do you handle incremental loading?

Solution 1: Watermark column
last_modified_date:
Max(Last_modified)
WHERE last_modified > last_run_time

Solution 2: CDC
transaction log
change tracking

Solution 3: File-based incremental
folder partition by date

How do you implement error handling?

Try-Catch pattern
Failure path
Retry policy

Activity
├ success → next step
└ failure → error handling pipeline

What are triggers in ADF?

Triggers are used to automatically start pipelines.

TriggerPurpose
Schedule triggerTime-based execution
Tumbling windowTime-based incremental
Event triggerStorage events
What is Tumbling Window Trigger?

A tumbling window trigger runs pipelines at fixed time intervals and processes data in discrete time windows.

How do you parameterize pipelines?

ADF supports parameters at:

  • Linked Service
  • plpeline
  • dataset
What are common ADF performance optimizations?

Parallel copy: pipeline success; pipeline duration; failure rate
Staging: use blob staging
Partitioning: split large tables

How do you monitor pipelines?

Monitoring options:
ADF Monitor UI
Azure Monitor
Log Analytics

Describe the data storage options available in Databricks.

Databricks offers several ways to store data. First, there’s the Databricks File System for storing and managing files. Then, there’s Delta Lake, an open-source storage layer that adds ACID transactions to Apache Spark, making it more reliable. Databricks also integrates with cloud storage services like AWS S3, Azure Blob Storage, and Google Cloud Storage. Plus, you can connect to a range of external databases, both relational and NoSQL, using JDBC.

What is Databricks Delta (Delta Lakehouse) and how does it enhance the capabilities of Azure Databricks?

Databricks Delta, now known as Delta Lake, is an open-source storage layer that brings ACID transactions to Apache Spark and big data workloads. It enhances Azure Databricks by providing features like:

  • ACID transactions for data reliability and consistency.
  • Scalable metadata handling for large tables.
  • Time travel for data versioning and historical data analysis.
  • Schema enforcement and evolution.
  • Improved performance with data skipping and Z-ordering
Are there any alternative solution that is similar to Delta lakehouse?

there are several alternative technologies that provide Delta Lake–style Lakehouse capabilities (ACID + schema enforcement + time travel + scalable storage + SQL engine). such as,

  • Apache Iceberg
  • Apache Hudi
  • Snowflake (Iceberg Tables / Unistore)
  • BigQuery + BigLake
  • AWS Redshift + Lake Formation + Apache Iceberg
  • Microsoft Fabric (OneLake + Delta/DQ/DLTS)
What is Delta Table ( as known as Delta Lake Table)?

Delta lake tables are tables that store data in the delta format. Delta Lake is an extension to existing data lakes,

What is Delta Live Table?

Delta Live Tables (DLT) is a framework in Azure Databricks for building reliable, automated, and scalable data pipelines using Delta Lake tables.

It simplifies ETL development by managing data dependencies, orchestration, quality checks, and monitoring automatically.

Explain how you can use Databricks to implement a Medallion Architecture (Bronze, Silver, Gold).
  1. Bronze Layer (Raw Data): Ingest raw data from various sources into the Bronze layer. This data is stored as-is, without any transformation.
  2. Silver Layer (Cleaned Data, as known Enriched layer): Clean and enrich the data from the Bronze layer. Apply transformations, data cleansing, and filtering to create more refined datasets.
  3. Gold Layer (Aggregated Data, as known Curated layer): Aggregate and further transform the data from the Silver layer to create high-level business tables or machine learning features. This layer is used for analytics and reporting.
What Is Z-Order (Databricks / Delta Lake)?

Z-Ordering in Databricks (specifically for Delta Lake tables) is an optimization technique designed to co-locate related information into the same set of data files on disk.

OPTIMIZE mytable
ZORDER BY (col1, col2);
What Is Liquid Clustering (Databricks)?

Liquid Clustering is Databricks’ next-generation data layout optimization for Delta Lake.
It replaces (and is far superior to) Z-Order.

## At creation time:
CREATE TABLE sales
CLUSTER BY (customer_id, event_date)
AS SELECT * FROM source;

## For existing tables:
ALTER TABLE sales
CLUSTER BY (customer_id, event_date);

## Trigger the actual clustering:
OPTIMIZE sales;

## Remove clustering:
ALTER TABLE sales
CLUSTER BY NONE;
What is a Dataframe, RDD, Dataset in Azure Databricks?

Dataframe refers to a specified form of tables employed to store the data within Databricks during runtime. In this data structure, the data will be arranged into two-dimensional rows and columns to achieve better accessibility.

RDD, Resilient Distributed Dataset, is a fault-tolerant, immutable collection of elements partitioned across the nodes of a cluster. RDDs are the basic building blocks that power all of Spark’s computations.

Dataset is an extension of the DataFrame API that provides compile-time type safety and object-oriented programming benefits.

What is catching and its types?

A cache is a temporary storage that holds frequently accessed data, aiming to reduce latency and enhance speed. Caching involves the process of storing data in cache memory.

What is Spark Cache / Persist (Memory Cache)

This is the standard Apache Spark feature. It stores data in the JVM Heap (Memory).

What is Databricks Disk Cache (Local Disk)

This is a Databricks-specific optimization. It automatically stores copies of remote files (Parquet/Delta) on the local NVMe SSDs of the worker nodes.

comparing .cache() and .persist()

both .cache() and .persist() are used to save intermediate results to avoid re-computing the entire lineage. The fundamental difference is that .cache() is a specific, pre-configured version of .persist().

.cache(): This is a shorthand for .persist(StorageLevel.MEMORY_ONLY). It tries to store your data in the JVM heap as deserialized objects.

.persist(): This is the more flexible version. It allows you to pass a StorageLevel to decide exactly how and where the data should be stored (RAM, Disk, or both).

How do you optimize Databricks?

When optimizing Databricks workloads, I focus on several layers. First, I optimize data layout using partitioning and Z-ordering on Delta tables. Second, I improve Spark performance by using broadcast joins, filtering data early, and caching intermediate results. Third, I tune cluster resources such as autoscaling and Photon engine. Finally, I run Delta maintenance commands like OPTIMIZE and VACUUM to manage small files and improve query performance.

Data Layout Optimization:
Partitioning:
CREATE TABLE sales
USING DELTA
PARTITIONED BY (date)

Z-Ordering:
OPTIMIZE sales
ZORDER BY (customer_id);

Liquid Clustering:

What is Photon Engine?

Photon is a high-performance query engine built in C++ that accelerates SQL queries and data processing workloads in Azure Databricks. It improves performance by using vectorized processing and optimized execution for modern hardware.

How would you secure and manage secrets in Azure Databricks when connecting to external data sources?
  1. Use Azure Key Vault to store and manage secrets securely.
  2. Integrate Azure Key Vault with Azure Databricks using Databricks-backed or Azure-backed scopes.
  3. Access secrets in notebooks and jobs using the dbutils.secrets API.
    dbutils.secrets.get(scope=”<scope-name>”, key=”<key-name>”)
  4. Ensure that secret access policies are strictly controlled and audited.
Scenario: You need to implement a data governance strategy in Azure Databricks. What steps would you take?

  • Data Classification: Classify data based on sensitivity and compliance requirements.
  • Access Controls: Implement role-based access control (RBAC) using Azure Active Directory.
  • Data Lineage: Use tools like Databricks Lineage to track data transformations and movement.
  • Audit Logs: Enable and monitor audit logs to track access and changes to data.
  • Compliance Policies: Implement Azure Policies and Azure Purview for data governance and compliance monitoring.
Scenario: You need to optimize a Spark job that has a large number of shuffle operations causing performance issues. What techniques would you use?
  1. Repartitioning: Repartition the data to balance the workload across nodes and reduce skew.
  2. Broadcast Joins: Use broadcast joins for small datasets to avoid shuffle operations.
  3. Caching: Cache intermediate results to reduce the need for recomputation.
  4. Shuffle Partitions: Increase the number of shuffle partitions to distribute the workload more evenly.
  5. Skew Handling: Identify and handle skewed data by adding salt keys or custom partitioning strategies.
Scenario: You need to migrate an on-premises Hadoop workload to Azure Databricks. Describe your migration strategy.
  • Assessment: Evaluate the existing Hadoop workloads and identify components to be migrated.
  • Data Transfer: Use Azure Data Factory or Azure Databricks to transfer data from on-premises HDFS to ADLS.
  • Code Migration: Convert Hadoop jobs (e.g., MapReduce, Hive) to Spark jobs and test them in Databricks.
  • Optimization: Optimize the Spark jobs for performance and cost-efficiency.
  • Validation: Validate the migrated workloads to ensure they produce the same results as on-premises.
  • Deployment: Deploy the migrated workloads to production and monitor their performance.

What are Synapse SQL Workspaces and how are they used?

Synapse SQL Workspaces are the environments within Azure Synapse Analytics where users can perform data querying and management tasks. They include:

  • Provisioned SQL Pools: Used for large-scale, high-performance data warehousing. Users can create and manage databases, tables, and indexes, and run complex queries.
  • On-Demand SQL Pools: Allow users to query data directly from Azure Data Lake without creating a dedicated data warehouse. It is ideal for interactive and exploratory queries.
 What is the difference between On-Demand SQL Pool and Provisioned SQL Pool?

The primary difference between On-Demand SQL Pool and Provisioned SQL Pool lies in their usage and scalability:

  • On-Demand SQL Pool: Allows users to query data stored in Azure Data Lake without requiring a dedicated resource allocation. It is best for ad-hoc queries and does not incur costs when not in use. It scales automatically based on query demand.
  • Provisioned SQL Pool: Provides a dedicated set of resources for running data warehousing workloads. It is optimized for performance and can handle large-scale data operations. Costs are incurred based on the provisioned resources and are suitable for predictable, high-throughput workloads.
 How does Azure Synapse Analytics handle data integration?

Azure Synapse Analytics handles data integration through Synapse Pipelines, which is a data integration service built on Azure Data Factory. It enables users to:

  • Ingest Data: Extract data from various sources, including relational databases, non-relational data stores, and cloud-based services.
  • Transform Data: Use data flows and data wrangling to clean and transform data.
  • Orchestrate Workflows: Schedule and manage data workflows, including ETL (Extract, Transform, Load) processes.
  • Data Integration Runtime: Utilizes Azure Integration Runtime for data movement and transformation tasks.

Can you explain the concept of “Dedicated SQL Pool” in Azure Synapse Analytics?

Dedicated SQL Pool is a provisioned, high-performance relational database.

  • Data Storage: Data must be ingested and stored internally in a proprietary columnar format. It follows a Schema-on-Write approach.
  • Architecture: Uses MPP (Massively Parallel Processing) architecture. Data is sharded into 60 distributions and processed by multiple compute nodes in parallel.
  • Cost: Billed by the hour based on the provisioned DWUs. You can pause it when not in use to save costs.
  • Best For: Stable production reporting, TB/PB scale enterprise data warehousing, and high-concurrency queries needing sub-second response.
What is Serverless SQL Pool

Serverless SQL Pool is a An on-demand, compute-only query engine with no internal storage.

  • Data Location: It does not store data. Data remains in the Data Lake (ADLS Gen2) in open formats like Parquet, CSV, or JSON.
  • Mechanism: Uses the OPENROWSET function to query lake files directly. It follows a Schema-on-Read approach.
  • Cost: Billed per query based on data scanned (approx. $5 USD per TB). Cost is $0 if no queries are run.
  • Best For: Rapid data discovery, building a Logical Data Warehouse, and ad-hoc data validation.
What is a Synapse Spark Pool, and when would you use it?

It is a managed Apache Spark 3 instance easily created and configured within Azure.

  • Managed Cluster: You don’t manage servers; you just select the node size and the number of nodes.
  • Auto-Scale & Auto-Pause: It automatically scales nodes based on workload and pauses after 5 minutes of inactivity to save costs.
  • Language Support: Supports PySpark (Python), Spark SQL, Scala, and .NET.

Can you talk about database locker?

Database locking is the mechanism a database uses to control concurrent access to data so that transactions stay consistent, isolated, and safe.

Locking prevents:

  • Dirty reads
  • Lost updates
  • Write conflicts
  • Race conditions

Types of Locks:

1. Shared Lock (S)

  • Used when reading data
  • Multiple readers allowed
  • No writers allowed

2. Exclusive Lock (X)

  • Used when updating or inserting
  • No one else can read or write the locked item

3. Update Lock (U) (SQL Server specific)

  • Prevents deadlocks when upgrading from Shared → Exclusive
  • Only one Update lock allowed

4. Intention Locks (IS, IX, SIX)

Used at table or page level to signal a lower-level lock is coming.

5. Row / Page / Table Locks

Based on granularity:

  • Row-level: Most common, best concurrency
  • Page-level: Several rows together
  • Table-level: When scanning or modifying large portions

DB engines automatically escalate:

Row → Page → Table
when there are too many small locks.

Can you talk on Deadlock?

A deadlock happens when:

  • Transaction A holds Lock 1 and wants Lock 2
  • Transaction B holds Lock 2 and wants Lock 1

Both wait on each other → neither can move → database detects → kills one transaction (“deadlock victim”).

Deadlocks usually involve one writer + one writer, but can also involve readers depending on isolation level.

How to Troubleshoot Deadlocks?
A: In SQL Server: Enable Deadlock Graph Capture
run:
ALTER DATABASE [YourDB] SET DEADLOCK_PRIORITY NORMAL;

use:
DBCC TRACEON (1222, -1);
DBCC TRACEON (1204, -1);
B: Interpret the Deadlock Graph

You will see:

  • Processes (T1, T2…)
  • Resources (keys, pages, objects)
  • Types of locks (X, S, U, IX, etc.)
  • Which statement caused the deadlock

Look for:

  • Two queries touching the same index/rows in different order
  • A scanning query locking too many rows
  • Missed indexes
  • Query patterns that cause U → X lock upgrades
C. Identify
  • The exact tables/images involved
  • The order of locking
  • The hotspot row or range
  • Rows with heavy update/contention

This will tell you what to fix.

How to Prevent Deadlocks (Practical + Senior-Level)
  • Always update rows in the same order
  • Keep transactions short
  • Use appropriate indexes
  • Use the correct isolation level
  • Avoid long reads before writes

Can you discuss on database normalization and denormalization

Normalization is the process of structuring a relational database to minimize data redundancy (duplicate data) and improve data integrity.

Normal FormRule SummaryProblem Solved
1NF (First)Eliminate repeating groups; ensure all column values are atomic (indivisible).Multi-valued columns, non-unique rows.
2NF (Second)Be in 1NF, AND all non-key attributes must depend on the entire primary key.Partial Dependency (non-key attribute depends on part of a composite key).
3NF (Third)Be in 2NF, AND eliminate transitive dependency (non-key attribute depends on another non-key attribute).Redundancy due to indirect dependencies.
BCNF (Boyce-Codd)A stricter version of 3NF; every determinant (column that determines another column) must be a candidate key.Edge cases involving multiple candidate keys.

Denormalization is the process of intentionally introducing redundancy into a previously normalized database to improve read performance and simplify complex queries.

  • Adding Redundant Columns: Copying a value from one table to another (e.g., copying the CustomerName into the Orders table to avoid joining to the Customer table every time an order is viewed).
  • Creating Aggregate/Summary Tables: Storing pre-calculated totals, averages, or counts to avoid running expensive aggregate functions at query time (e.g., a table that stores the daily sales total).
  • Merging Tables: Combining two tables that are frequently joined into a single, wider table.

Comparison of Microsoft Fabric, Azure Synapse Analytics (ASA), Azure Data Factory (ADF), and Azure Databricks (ADB)

Today, data engineers have a wide array of tools and platforms at their disposal for data engineering projects. Popular choices include Microsoft Fabric, Azure Synapse Analytics (ASA), Azure Data Factory (ADF), and Azure Databricks (ADB). It’s common to wonder which one is the best fit for your specific needs.

Side by Side comparison

Here’s a concise comparison of Microsoft FabricAzure Synapse AnalyticsAzure Data Factory (ADF), and Azure Databricks (ADB) based on their key features, use cases, and differences:

FeatureMicrosoft FabricAzure Synapse AnalyticsAzure Data Factory (ADF)Azure Databricks (ADB)
TypeUnified SaaS analytics platformIntegrated analytics serviceCloud ETL/ELT serviceApache Spark-based analytics platform
Primary Use CaseEnd-to-end analytics (Data Engineering, Warehousing, BI, Real-Time)Large-scale data warehousing & analyticsData integration & orchestrationBig Data processing, ML, AI, advanced analytics
Data IntegrationBuilt-in Data Factory capabilitiesSynapse Pipelines (similar to ADF)Hybrid ETL/ELT pipelinesLimited (relies on Delta Lake, ADF, or custom code)
Data WarehousingOneLake (Delta-Parquet based)Dedicated SQL pools (MPP)Not applicableCan integrate with Synapse/Delta Lake
Big Data ProcessingSpark-based (Fabric Spark)Spark pools (serverless/dedicated)No (orchestration only)Optimized Spark clusters (Delta Lake)
Real-Time AnalyticsYes (Real-Time Hub)Yes (Synapse Real-Time Analytics)NoYes (Structured Streaming)
Business IntelligencePower BI (deeply integrated)Power BI integrationNoLimited (via dashboards or Power BI)
Machine LearningBasic ML integrationML in Spark poolsNoFull ML/DL support (MLflow, AutoML)
Pricing ModelCapacity-based (Fabric SKUs)Pay-as-you-go (serverless) or dedicatedActivity-basedDBU-based (compute + storage)
Open Source SupportLimited (Delta-Parquet)Limited (Spark, SQL)NoFull (Spark, Python, R, ML frameworks)
GovernanceCentralized (OneLake, Purview)Workspace-levelLimitedWorkspace-level (Unity Catalog)

Key Differences

  • Fabric vs Synapse: Fabric is a fully managed SaaS (simpler, less configurable), while Synapse offers more control (dedicated SQL pools, Spark clusters).
  • ADF vs Synapse Pipelines: Synapse Pipelines = ADF inside Synapse (same engine).
  • ADB vs Fabric Spark: ADB has better ML & open-source support, while Fabric Spark is simpler & integrated with Power BI.

When to Use Which

  1. Microsoft Fabric
    • Best for end-to-end analytics in a unified SaaS platform (no infrastructure management).
    • Combines data engineering, warehousing, real-time, and BI in one place.
    • Good for Power BI-centric organizations.
  2. Azure Synapse Analytics
    • Best for large-scale data warehousing with SQL & Spark processing.
    • Hybrid of ETL (Synapse Pipelines), SQL Pools, and Spark analytics.
    • More flexible than Fabric (supports open formats like Parquet, CSV).
  3. Azure Data Factory (ADF)
    • Best for orchestrating ETL/ELT workflows (no compute/storage of its own).
    • Used for data movement, transformations, and scheduling.
    • Often paired with Synapse or Databricks.
  4. Azure Databricks (ADB)
    • Best for advanced analytics, AI/ML, and big data processing with Spark.
    • Optimized for Delta Lake (ACID transactions on data lakes).
    • Preferred for data science teams needing MLflow, AutoML, etc.

Which One Should You Choose?

  • For a fully integrated Microsoft-centric solution → Fabric
  • For large-scale data warehousing + analytics → Synapse
  • For ETL/data movement → ADF (or Synapse Pipelines)
  • For advanced Spark-based analytics & ML → Databricks