Unity Catalog – Table Type Comparison

In Azure Databricks Unity Catalog, you can create different types of tables depending on your storage and management needs. The main table types are including Managed TablesExternal TablesDelta TablesForeign TablesStreaming TablesLive Tables (deprecated)Feature Tables, and Hive Tables (legacy). Each table type is explained in detail, and a side-by-side comparison is provided for clarity.

Side-by-Side Comparison Table

FeatureManaged TablesExternal TablesDelta TablesForeign TablesStreaming TablesDelta Live Tables (DLT)Feature TablesHive Tables (Legacy)
StorageDatabricks-managedExternal storageManaged/ExternalExternal databaseDatabricks-managedDatabricks-managedManaged/ExternalManaged/External
LocationInternal Delta LakeSpecified external pathInternal/External Delta LakeExternal metastore (Snowflake, BigQuery)Internal Delta LakeInternal Delta LakeInternal/External Delta LakeInternal/External storage
OwnershipDatabricksUserDatabricks/UserExternal providerDatabricksDatabricksDatabricks/UserDatabricks (Legacy Hive Metastore)
Deletion ImpactDeletes data & metadataDeletes only metadataDepends (Managed: Deletes, External: Keeps data)Deletes only metadata referenceDeletes data & metadataDeletes data & metadataDeletes metadata (but not feature versions)Similar to Managed/External
FormatDelta LakeParquet, CSV, JSON, DeltaDelta LakeSnowflake, BigQuery, Redshift, etc.Delta LakeDelta LakeDelta LakeParquet, ORC, Avro, CSV
Use CaseFull lifecycle managementSharing with external toolsAdvanced data versioning & ACID complianceQuerying external DBsContinuous data updatesETL PipelinesML feature storageLegacy storage (pre-Unity Catalog)

Table type and describes

1. Managed Tables

Managed tables are tables where both the metadata and the data are managed by Unity Catalog. When you create a managed table, the data is stored in the default storage location associated with the catalog or schema.

Data Storage and location:

Unity Catalog manages both the metadata and the underlying data in a Databricks-managed location

The data is stored in a Unity Catalog-managed storage location. Typically in an internal Delta Lake storage, e.g., DBFS or Azure Data Lake Storage

Use Case:

Ideal for Databricks-centric workflows where you want Databricks to handle storage and metadata.

Pros & Cons:

Pros: Easy to manage, no need to worry about storage locations.

Cons: Data is tied to Databricks, making it harder to share externally.

Example:

CREATE TABLE managed_table (
    id INT,
    name STRING
);

INSERT INTO managed_table VALUES (1, 'Alice');

SELECT * FROM managed_table;

2. External Tables

External tables store metadata in Unity Catalog but keep data in an external storage location (e.g., Azure Blob Storage, ADLS, S3).

Data storage and Location:

The metadata is managed by Unity Catalog, but the actual data remains in external storage (like Azure Data Lake Storage Gen2 or an S3 bucket).

You must specify an explicit storage location, e.g., Azure Blob Storage, ADLS, S3).

Use Case:

Ideal for cross-platform data sharing or when data is managed outside Databricks.

Pros and Cons

Pros: Data is decoupled from Databricks, making it easier to share.

Cons: Requires manual management of external storage and permissions.

Preparing create external table

Before you can create an external table, you must create a storage credential that allows Unity Catalog to read from and write to the path on your cloud tenant, and an external location that references it.

Requirements
  • In Azure, create a service principal and grant it the Azure Blob Contributor role on your storage container.
  • In Azure, create a client secret for your service principal. Make a note of the client secret, the directory ID, and the application ID for the client secret.
step 1: Create a storage credential

You can create a storage credential using the Catalog Explorer or the Unity Catalog CLI. Follow these steps to create a storage credential using Catalog Explorer.

  1. In a new browser tab, log in to Databricks.
  2. Click Catalog.
  3. Click Storage Credentials.
  4. Click Create Credential.
  5. Enter example_credential for he name of the storage credential.
  6. Set Client SecretDirectory ID, and Application ID to the values for your service principal.
  7. Optionally enter a comment for the storage credential.
  8. Click Save.
    Leave this browser open for the next steps.
Create an external location

An external location references a storage credential and also contains a storage path on your cloud tenant. The external location allows reading from and writing to only that path and its child directories. You can create an external location from Catalog Explorer, a SQL command, or the Unity Catalog CLI. Follow these steps to create an external location using Catalog Explorer.

  1. Go to the browser tab where you just created a storage credential.
  2. Click Catalog.
  3. Click External Locations.
  4. Click Create location.
  5. Enter example_location for the name of the external location.
  6. Enter the storage container path for the location allows reading from or writing to.
  7. Set Storage Credential to example_credential to the storage credential you just created.
  8. Optionally enter a comment for the external location.
  9. Click Save.
-- Grant access to create tables in the external location
GRANT USE CATALOG
ON example_catalog
TO `all users`;
 
GRANT USE SCHEMA
ON example_catalog.example_schema
TO `all users`;
 
GRANT CREATE EXTERNAL TABLE
ON LOCATION example_location
TO `all users`;
-- Create an example catalog and schema to contain the new table
CREATE CATALOG IF NOT EXISTS example_catalog;
USE CATALOG example_catalog;
CREATE SCHEMA IF NOT EXISTS example_schema;
USE example_schema;
-- Create a new external Unity Catalog table from an existing table
-- Replace <bucket_path> with the storage location where the table will be created
CREATE TABLE IF NOT EXISTS trips_external
LOCATION 'abfss://<bucket_path>'
AS SELECT * from samples.nyctaxi.trips;
 
-- To use a storage credential directly, add 'WITH (CREDENTIAL <credential_name>)' to the SQL statement.

There are some useful Microsoft document to be refer:

Create an external table in Unity Catalog
Configure a managed identity for Unity Catalog
Create a Unity Catalog metastore
Manage access to external cloud services using service credentials
Create a storage credential for connecting to Azure Data Lake Storage Gen2
External locations

Example



CREATE TABLE external_table (
    id INT,
    name STRING
)
LOCATION 'abfss://container@storageaccount.dfs.core.windows.net/path/to/data';

INSERT INTO external_table VALUES (1, 'Bob');

SELECT * FROM external_table;

3. Foreign Tables

Foreign tables reference data stored in external systems (e.g., Snowflake, Redshift) without copying the data into Databricks.

Data Storage and Location

The metadata is stored in Unity Catalog, but the data resides in another metastore (e.g., an external data warehouse like Snowflake or BigQuery).

It does not point to raw files but to an external system.

Use Case:

Best for querying external databases like Snowflake, BigQuery, Redshift without moving data.

Pros and Cons

Pros: No data duplication, seamless integration with external systems.

Cond: Performance depends on the external system’s capabilities.

Example

CREATE FOREIGN TABLE foreign_table
USING com.databricks.spark.snowflake
OPTIONS (
    sfUrl 'snowflake-account-url',
    sfUser 'user',
    sfPassword 'password',
    sfDatabase 'database',
    sfSchema 'schema',
    dbtable 'table'
);

SELECT * FROM foreign_table;

4. Delta Tables

Delta tables use the Delta Lake format, providing ACID transactions, scalable metadata handling, and data versioning.

Data Storage and Location

A special type of managed or external table that uses Delta Lake format.

Can be in managed storage or external storage.

Use Case:

Ideal for reliable, versioned data pipelines.

Pros and Cons

Pros: ACID compliance, time travel, schema enforcement, efficient upserts/deletes.

Cons: Slightly more complex due to Delta Lake features.

Example

CREATE TABLE delta_table (
    id INT,
    name STRING
)
USING DELTA
LOCATION 'abfss://container@storageaccount.dfs.core.windows.net/path/to/delta';

INSERT INTO delta_table VALUES (1, 'Charlie');

SELECT * FROM delta_table;

-- Time travel example
SELECT * FROM delta_table VERSION AS OF 1;

5. Feature Tables

Feature tables are used in machine learning workflows to store and manage feature data for training and inference.

Data Storage and Location

Used for machine learning (ML) feature storage with Databricks Feature Store.

Can be managed or external.

Use Case:

Ideal for managing and sharing features across ML models and teams.

Pros and Cons:

Pros: Centralized feature management, versioning, and lineage tracking.

Pros: Centralized feature management, versioning, and lineage tracking.

Example:

from databricks.feature_store import FeatureStoreClient
fs = FeatureStoreClient()
fs.create_table(
    name="feature_table",
    primary_keys=["id"],
    schema="id INT, feature1 FLOAT, feature2 FLOAT",
    description="Example feature table"
)

fs.write_table("feature_table", df, mode="overwrite")
features = fs.read_table("feature_table")

6. Streaming Tables

Streaming tables are designed for real-time data ingestion and processing using Structured Streaming.

Data Location:

Can be stored in managed or external storage.

Use Case:

Ideal for real-time data pipelines and streaming analytics.

Pros and Cons

Pros: Supports real-time data processing, integrates with Delta Lake for reliability.

Cons: Requires understanding of streaming concepts and infrastructure.

Example:

CREATE TABLE streaming_table (
    id INT,
    name STRING
)
USING DELTA;

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("StreamingExample").getOrCreate()
streaming_df = spark.readStream.format("delta").load("/path/to/delta")
streaming_df.writeStream.format("delta").outputMode("append").start("/path/to/streaming_table")

Delta Live Tables (DLT)

Delta Live Tables (DLT) is the modern replacement for Live Tables. It is a framework for building reliable, maintainable, and scalable ETL pipelines using Delta Lake. DLT automatically handles dependencies, orchestration, and error recovery.

Data storage and Location:

Data is stored in Delta Lake format, either in managed or external storage.

Use Case:

Building production-grade ETL pipelines for batch and streaming data.

  • DLT pipelines are defined using Python or SQL.
  • Tables are automatically materialized and can be queried like any other Delta table.

Pros and Cons

  • Declarative pipeline definition.
  • Automatic dependency management.
  • Built-in data quality checks and error handling.
  • Supports both batch and streaming workloads.

Cons: Requires understanding of Delta Lake and ETL concepts.

Example

import dlt

@dlt.table
def live_table():
    return spark.read.format("delta").load("/path/to/source_table")

8. Hive Tables (Legacy)

Hive tables are legacy tables that use the Apache Hive format. They are supported for backward compatibility.

Data storage Location:

Can be stored in managed or external storage.

Use Case:

Legacy systems or migration projects.

Pros and Cons

  • Pros: Backward compatibility with older systems.
  • Cons: Lacks modern features like ACID transactions and time travel.

Example

CREATE TABLE hive_table (
    id INT,
    name STRING
)
STORED AS PARQUET;

INSERT INTO hive_table VALUES (1, 'Dave');
SELECT * FROM hive_table;

Final Thoughts

Use Delta Live Tables for automated ETL pipelines.

Use Feature Tables for machine learning models.

Use Foreign Tables for querying external databases.

Avoid Hive Tables unless working with legacy systems.

Summary

  • Managed Tables: Fully managed by Databricks, ideal for internal workflows.
  • External Tables: Metadata managed by Databricks, data stored externally, ideal for cross-platform sharing.
  • Delta Tables: Advanced features like ACID transactions and time travel, ideal for reliable pipelines.
  • Foreign Tables: Query external systems without data duplication.
  • Streaming Tables: Designed for real-time data processing.
  • Feature Tables: Specialized for machine learning feature management.
  • Hive Tables: Legacy format, not recommended for new projects.

Each table type has its own creation syntax and usage patterns, and the choice depends on your specific use case, data storage requirements, and workflow complexity.

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

Comparison of Unity Catalog, External Data Source, External Table, Mounting Data and Metastore

Comparison including the Databricks Catalog (Unity Catalog) alongside Mounting Data, External Data Source, External Table and Metastore:

FeatureDatabricks Catalog (Unity Catalog)Mounting DataExternal Data SourceExternal TableMetastore (Hive Metastore)
PurposeCentralized governance and access control for data across multiple workspaces and environments.Map cloud storage to DBFSQuery external databases directlyQuery external data in cloud storage via SQLStore metadata (schemas, table locations) for databases and tables in Databricks and Spark.
Data AccessSQL-based access to tables, views, and databases with unified governance.File-level access (Parquet, CSV, etc.)Database-level access (via JDBC/ODBC)Table-level access with metadata in DatabricksProvides table and schema information to Spark SQL, Hive, and Databricks.
SetupDefine catalog, databases, tables, views, and enforce permissions centrally.Mount external storage in DBFSConfigure connector (JDBC/ODBC)Create an external table with storage locationAutomatically manages metadata for tables and databases; can be customized or integrated with external metastores.
GovernanceCentralized governance, RBAC, column-level security, and audit logs.Managed by storage providerManaged by the external databaseManaged by external storage permissionsBasic governance, mainly for schema management; limited fine-grained access control.
ProsCentralized access control, auditing, lineage, and security across multiple environments.Easy access to filesNo need to copy data, works with SQL queriesAllows SQL queries on external dataSimplifies metadata management for large datasets and integrates seamlessly with Spark and Databricks.
ConsRequires Unity Catalog setup, and governance policies must be defined for all data assets.No built-in governanceLatency issues with external databasesMetadata management requires setupLacks advanced governance features like RBAC, auditing, and data lineage.
When to UseWhen you need centralized governance, access control, auditing, and security for data assets across multiple workspaces or cloud environments.When you need direct access to files stored externally, without ingestion.When you want to query external databases without moving the data.When you want SQL-based access to external files without copying them into Databricks.When you need basic schema and metadata management for tables and databases used by Databricks or Spark.

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

A few Important Terminology of Databricks

Azure Databricks is a managed platform for running Apache Spark jobs. In this post, I’ll go through some key Databricks terms to give you an overview of the different points you’ll use when running Databricks jobs (sorted by alphabet):

Catalog (Unity Catalog)

the Unity Catalog is a feature that provides centralized governance for data, allowing you to manage access to data across different Databricks workspaces and cloud environments. It helps define permissions, organize tables, and manage metadata, supporting multi-cloud and multi-workspace environments. Key benefits include:

  • Support for multi-cloud data governance.
  • Centralized access control and auditing.
  • Data lineage tracking.

Delta table

A Delta table is a data management solution provided by Delta Lake, an open-source storage layer that brings ACID transactions to big data workloads. A Delta table stores data as a directory of files on cloud object storage and registers table metadata to the metastore within a catalog and schema. By default, all tables created in Databricks are Delta tables.

External tables

External tables are tables whose data lifecycle, file layout, and storage location are not managed by Unity Catalog. Multiple data formats are supported for external tables.


CREATE EXTERNAL TABLE my_external_table (
  id INT,
  name STRING,
  age INT
)
LOCATION 'wasbs://[container]@[account].blob.core.windows.net/data/';

External Data Source

A connection to a data store that isn’t natively in Databricks but can be queried through a connection.

External Data Sources are typically external databases or data services (e.g., Azure SQL Database, Azure Synapse Analytics, Amazon RDS, or other relational or NoSQL databases). These sources are accessed via connectors (JDBC, ODBC, etc.) within Databricks.


jdbcUrl = "jdbc:sqlserver://[server].database.windows.net:1433;database=[database]"
connectionProperties = {
  "user" : "username",
  "password" : "password",
  "driver" : "com.microsoft.sqlserver.jdbc.SQLServerDriver"
}
df = spark.read.jdbc(jdbcUrl, "[table]", connectionProperties)

Hive Metastore

The Hive Metastore is the metadata repository for the data in Databricks, storing information about tables and databases. It’s used by the Spark SQL engine to manage metadata for the tables and to store information like schemas, table locations, and partitions. In Azure Databricks:

  • Schemas: Column names, types, and table structure.
  • Table locations: The path to where the actual data resides (in HDFS, Azure Data Lake, S3, etc.).
  • Partitions: If the table is partitioned, the metadata helps optimize query performance.

By default, each Databricks workspace has its own managed Hive metastore.

You can also connect to an external Hive metastore that is shared across multiple Databricks workspaces or use Azure-managed services like Azure SQL Database for Hive metadata storage.

Managed tables

Managed tables are the preferred way to create tables in Unity Catalog. Unity Catalog fully manages their lifecycle, file layout, and storage. Unity Catalog also optimizes their performance automatically. Managed tables always use the Delta table format.

Managed tables reside in a managed storage location that you reserve for Unity Catalog. Because of this storage requirement, you must use CLONE or CREATE TABLE AS SELECT (CTAS) if you want to copy existing Hive tables to Unity Catalog as managed tables.

Mounting Data

Mounting external storage into Databricks as if it’s part of the Databricks File System (DBFS)


dbutils.fs.mount(
    source="wasbs://[container]@[account].blob.core.windows.net/",
    mount_point="/mnt/mydata",
    extra_configs={"fs.azure.account.key.[account].blob.core.windows.net": "[account_key]"}
)

Workflows

In Databricks, Workflows are a way to orchestrate data pipelines, machine learning tasks, and other computational processes. Workflows allow you to automate the execution of notebooks, Python scripts, JAR files, or any other job task within Databricks and run them on a schedule, trigger, or as part of a complex pipeline.

Key Components of Workflows in Databricks:

Jobs: Workflows in Databricks are typically managed through jobs. A job is a scheduled or triggered run of a notebook, script, or other tasks in Databricks. Jobs can consist of a single task or multiple tasks linked together.

Task: Each task in a job represents an individual unit of work. You can have multiple tasks in a job, which can be executed sequentially or in parallel.

Triggers: Workflows can be triggered manually, based on a schedule (e.g., hourly, daily), or triggered by an external event (such as a webhook).

Cluster: When running a job in a workflow, you need to specify a Databricks cluster (either an existing cluster or one that is started just for the job). Workflows can also specify job clusters, which are clusters that are spun up and terminated automatically for the specific job.

Types of Workflows
  1. Single-task Jobs: These jobs consist of just one task, like running a Databricks notebook or a Python/Scala/SQL script. You can schedule these jobs to run at specific intervals or trigger them manually.
  2. Multi-task Workflows: These workflows are more complex and allow for creating pipelines of interdependent tasks that can be run in sequence or in parallel. Each task can have dependencies on the completion of previous tasks, allowing you to build complex pipelines that branch based on results.Example: A data pipeline might consist of three tasks:
    • Task 1: Ingest data from a data lake into a Delta table.
    • Task 2: Perform transformations on the ingested data.
    • Task 3: Run a machine learning model on the transformed data.
  3. Parameterized Workflows: You can pass parameters to a job when scheduling it, allowing for more dynamic behavior. This is useful when you want to run the same task with different inputs (e.g., processing data for different dates).
Creating Workflows in Databricks

Workflows can be created through the Jobs UI in Databricks or programmatically using the Databricks REST API.

Example of Creating a Simple Workflow:
  1. Navigate to the Jobs Tab:
    • In Databricks, go to the Jobs tab in the workspace.
  2. Create a New Job:
    • Click Create Job.
    • Specify the name of the job.
  3. Define a Task:
    • Choose a task type (Notebook, JAR, Python script, etc.).
    • Select the cluster to run the job on (or specify a job cluster).
    • Add parameters or libraries if required.
  4. Schedule or Trigger the Job:
    • Set a schedule (e.g., run every day at 9 AM) or choose manual triggering.
    • You can also configure alerts or notifications (e.g., send an email if the job fails).
Multi-task Workflow Example:
  1. Add Multiple Tasks:
    • After creating a job, you can add more tasks by clicking Add Task.
    • For each task, you can specify dependencies (e.g., Task 2 should run only after Task 1 succeeds).
  2. Manage Dependencies:
    • You can configure tasks to run in sequence or in parallel.
    • Define whether a task should run on success, failure, or based on a custom condition.
Key Features of Databricks Workflows:
  1. Orchestration: Allows for complex job orchestration, including dependencies between tasks, retries, and conditional logic.
  2. Job Scheduling: You can schedule jobs to run at regular intervals (e.g., daily, weekly) using cron expressions or Databricks’ simple scheduler.
  3. Parameterized Runs: Pass parameters to notebooks, scripts, or other tasks in the workflow, allowing dynamic control of jobs.
  4. Cluster Management: Workflows automatically handle cluster management, starting clusters when needed and shutting them down after the job completes.
  5. Notifications: Workflows allow setting up notifications on job completion, failure, or other conditions. These notifications can be sent via email, Slack, or other integrations.
  6. Retries: If a job or task fails, you can configure it to automatically retry a specified number of times before being marked as failed.
  7. Versioning: Workflows can be versioned, so you can track changes and run jobs based on different versions of a notebook or script.
Common Use Cases for Databricks Workflows:
  • ETL Pipelines: Automating the extraction, transformation, and loading (ETL) of data from source systems to a data lake or data warehouse.
  • Machine Learning Pipelines: Orchestrating the steps involved in data preprocessing, model training, evaluation, and deployment.
  • Batch Processing: Scheduling large-scale data processing tasks to run on a regular basis.
  • Data Ingestion: Automating the ingestion of raw data into Delta Lake or other storage solutions.
  • Alerts and Monitoring: Running scheduled jobs that trigger alerts based on conditions in the data (e.g., anomaly detection).

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)

Create External Data Sources in Synapse Serverless SQL

An external data source in Synapse serverless SQL is typically used to reference data stored outside of the SQL pool, such as in Azure Data Lake Storage (ADLS) or Blob Storage. This allows you to query data directly from these external sources using T-SQL.

There are different ways to create external data source. Using Synapse Studio UI, coding etc. the easiest way is to leverage Synapse Studio UI. But we had better know how to use code to create it since in some cases we have to use this way.

Here’s how to create an external data source in Synapse serverless SQL

Using Synapse Studio UI to create External Data Source

Create Lake Database

Open Synapse Studio

On the left side, select Data portal > workspace

Fill in the properties:

Create external table from data lake

Double clicks the Lake Database you just created.

in the Lake Database tag, click “+ Table”

fill in the detail information:

Continue to configure the table properyies

Adjust Table properties

Adjust column other properties, or add even more columns, such as data type, description, Nullability, Primary Key, set up partition create relationship …… etc.

Repeat the above steps to create even more tables to meet your business logic need, or create relationship if need.

Script to create an External Data Source

Step 1:

1. Connect to Serverless SQL Pool:

Open Synapse Studio, go to the “Data” hub, and connect to your serverless SQL pool.

2. Create the External Data Source:

Use the following T-SQL script to create an external data source that points to your Azure Data Lake Storage (ADLS) or Blob Storage:

CREATE EXTERNAL DATA SOURCE MyExternalDataSource
WITH (
LOCATION = ‘https://<your-storage-account-name>.dfs.core.windows.net/<your-filesystem-name>‘,
CREDENTIAL = <your-credential-name>
);

Replace <your-storage-account-name>, <your-filesystem-name>, and <your-credential-name> with the appropriate values:

  • LOCATION: The URL of your Azure Data Lake Storage (ADLS) or Blob Storage.
  • CREDENTIAL: The name of the database credential used to access the storage. (You may need to create this credential if it doesn’t already exist.)

Step 2:

If you don’t have a credential yet, create one as follows:

1. Create a Credential:

CREATE DATABASE SCOPED CREDENTIAL MyStorageCredential
WITH IDENTITY = ‘SHARED ACCESS SIGNATURE’,
SECRET = ”;

Replace <your-SAS-token> with your Azure Storage Shared Access Signature (SAS) token.

2. Create an External Table or Query the External Data

After setting up the external data source, you can create external tables or directly query data:

Create an External Table:

You can create an external table that maps to the data in your external storage:

CREATE EXTERNAL TABLE MyExternalTable (
Column1 INT,
Column2 NVARCHAR(50),
Column3 DATETIME
)
WITH (
LOCATION = ‘/path/to/data.csv’,
DATA_SOURCE = MyExternalDataSource,
FILE_FORMAT = MyFileFormat — You need to define a file format
);

Query the External Data

You can also directly query the data without creating an external table:

SELECT *
FROM OPENROWSET(
BULK ‘/path/to/data.csv’,
DATA_SOURCE = ‘MyExternalDataSource’,
FORMAT = ‘CSV’,
FIELDTERMINATOR = ‘,’,
ROWTERMINATOR = ‘\n’
) AS MyData;

Create and Use a File Format (Optional)

If you are querying structured files (like CSV, Parquet), you might need to define a file format:

CREATE EXTERNAL FILE FORMAT MyFileFormat
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (FIELD_TERMINATOR = ‘,’, STRING_DELIMITER = ‘”‘)
);

Summary

By following these steps, you should be able to connect to and query your external data sources using the serverless SQL pool in Synapse. Let me know if you need further assistance!

  • Create an external data source in Synapse serverless SQL to point to your external storage.
  • Create a database scoped credential if necessary to access your storage.
  • Create an external table or directly query data using OPENROWSET.
  • Define a file format if working with structured data like CSV or Parquet.

Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca

(remove all space from the email account 😊)