Fabric terminology

Learn the definitions of terms used in Microsoft Fabric, including terms specific to Fabric Data Engineering, Data Factory, Fabric Data Science, Fabric Data Warehouse, IQ, Real-Time Intelligence, and Power BI.

General terms

Capacity

Dedicated compute resources for Fabric workloads.

Capacity defines the ability of a resource to perform an activity or to produce output. Different items consume different capacity at a certain time.

Direct Lake

Query Delta tables directly without import.

Experience

An Experience in Fabric refers to a specific workload environment or toolset designed for a particular data task.

Examples of Experiences

  • Microsoft Fabric Data Engineering
  • Microsoft Fabric Data Factory
  • Microsoft Fabric Data Science
  • Microsoft Fabric Data Warehouse
  • Microsoft Fabric Power BI
  • Microsoft Fabric Real-Time Intelligence

Experiment

An Experiment in Fabric is used in data science and machine learning to track multiple model training runs and compare results.

IQ

Ontology (preview) is an item where you can define entity types, relationships, properties, and other constraints to organize data according to your business vocabulary.

Item

An item is an object that actually created. It is a set of capabilities within a workload. Users can create, edit, and delete them. Each item type provides different capabilities. For example, the Data Engineering workload includes the lakehouse, notebook, and Spark job definition items.

ExperienceItem
Data EngineeringLakehouse
Data FactoryPipeline
Data WarehouseWarehouse
Power BIReport
Real-Time AnalyticsEventstream

Lakehouse

Combines data lake storage with data warehouse features.

A lakehouse is a database built over a data lake, containing files, folders, and tables. It is used by the Apache Spark engine and SQL engine for big data processing. Lakehouses support ACID transactions when using the open-source Delta formatted tables. The lakehouse item is hosted within a unique workspace folder in Microsoft OneLake.

Liquid Clustering

Adaptive clustering mechanism for Delta tables

OneLake

A single, unified data lake for the entire Fabric tenant.

Shortcut

Shortcuts are embedded references within OneLake that point to other file store locations. They provide a way to connect to existing data without having to directly copy it.

V-order

V-Order (Vertical Order) is a column-oriented data layout optimization used in Fabric Lakehouse to improve analytic query performance, especially for Power BI Direct Lake.

A write optimization to the parquet file format that enables fast reads and provides cost efficiency and better performance. All the Fabric engines write v-ordered parquet files by default.

Workload / Experience

In Microsoft Fabric, a Workload (also called an Experience) is a functional area of the platform that provides a specific type of analytics capability.

Each workload has:

  • Its own UI
  • Its own tools
  • Its own compute behavior
  • But shares OneLake and security

Workspace

A logical container for Fabric items.

Appendix

Microsoft Fabric terminology

delta: Schema Evolution

Schema Evolution in Databricks refers to the ability to automatically adapt and manage changes in the structure (schema) of a Delta Lake table over time. It allows users to modify the schema of an existing table (e.g., adding or updating columns) without the need for a complete rewrite of the data.

Key Features of Schema Evolution

  1. Automatic Adaptation: Delta Lake can automatically evolve the schema of a table when new columns are added to the incoming data, or when data types change, if certain configurations are enabled.
  2. Backward and Forward Compatibility: Delta Lake ensures that new data can be written to a table without breaking the existing schema. It also ensures that existing queries remain compatible, even if the schema changes.

Configuration for Schema Evolution

mergeSchema

This option allows you to append new data with a schema that differs from the existing table schema. It merges the new schema into the table.

Usage: Typically used when you are appending data.

Schema Merging: Use mergeSchema only for adding new columns, not for incompatible changes.

When new data has additional columns that aren’t present in the target Delta table, Delta Lake can automatically merge the new schema into the existing table schema.


# Append new data to the Delta table with automatic schema merging

df_new_data.write.format("delta").mode("append").option("mergeSchema", "true").save("/path/to/delta-table")


overwriteSchema

This option is used when you want to completely replace the schema of the table with the schema of the new data.

If you want to replace the entire schema (including removing existing columns), you can use the overwriteSchema option.


# Overwrite the existing Delta table schema with new data

df_new_data.write.format("delta").mode("overwrite").option("overwriteSchema", "true").save("/path/to/delta-table")


Configure spark.databricks.delta.schema.autoMerge

You can configure this setting at the following levels:

Usage: Typically used when you are overwriting data

  • Session Level (applies to a specific session or job)
  • Cluster Level (applies to all jobs on the cluster)

Session-Level Configuration (Spark session level)

Once this is enabled, all write and merge operations in the session will automatically allow schema evolution.


# Enable auto schema merging for the session

spark.conf.set("spark.databricks.delta.schema.autoMerge.enabled", "true")

Cluster-Level Configuration

This enables automatic schema merging for all operations on the cluster without needing to set it in each job.

  1. Go to your Databricks Workspace.
  2. Navigate to Clusters and select your cluster.
  3. Go to the Configuration tab.
  4. Under Spark Config, add the following configuration:
    spark.databricks.delta.schema.autoMerge.enabled true

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)