An external data source in Synapse serverless SQL is typically used to reference data stored outside of the SQL pool, such as in Azure Data Lake Storage (ADLS) or Blob Storage. This allows you to query data directly from these external sources using T-SQL.
There are different ways to create external data source. Using Synapse Studio UI, coding etc. the easiest way is to leverage Synapse Studio UI. But we had better know how to use code to create it since in some cases we have to use this way.
Here’s how to create an external data source in Synapse serverless SQL
Using Synapse Studio UI to create External Data Source
Create Lake Database
Open Synapse Studio
On the left side, select Data portal > workspace
Fill in the properties:
Create external table from data lake
Double clicks the Lake Database you just created.
in the Lake Database tag, click “+ Table”
fill in the detail information:
Continue to configure the table properyies
Adjust Table properties
Adjust column other properties, or add even more columns, such as data type, description, Nullability, Primary Key, set up partition create relationship …… etc.
Repeat the above steps to create even more tables to meet your business logic need, or create relationship if need.
Script to create an External Data Source
Step 1:
1. Connect to Serverless SQL Pool:
Open Synapse Studio, go to the “Data” hub, and connect to your serverless SQL pool.
2. Create the External Data Source:
Use the following T-SQL script to create an external data source that points to your Azure Data Lake Storage (ADLS) or Blob Storage:
CREATE EXTERNAL DATA SOURCE MyExternalDataSource
WITH (
LOCATION = ‘https://<your-storage-account-name>.dfs.core.windows.net/<your-filesystem-name>‘,
CREDENTIAL = <your-credential-name>
);
Replace <your-storage-account-name>
, <your-filesystem-name>
, and <your-credential-name>
with the appropriate values:
LOCATION
: The URL of your Azure Data Lake Storage (ADLS) or Blob Storage.CREDENTIAL
: The name of the database credential used to access the storage. (You may need to create this credential if it doesn’t already exist.)
Step 2:
If you don’t have a credential yet, create one as follows:
1. Create a Credential:
CREATE DATABASE SCOPED CREDENTIAL MyStorageCredential
WITH IDENTITY = ‘SHARED ACCESS SIGNATURE’,
SECRET = ”;
Replace <your-SAS-token>
with your Azure Storage Shared Access Signature (SAS) token.
2. Create an External Table or Query the External Data
After setting up the external data source, you can create external tables or directly query data:
Create an External Table:
You can create an external table that maps to the data in your external storage:
CREATE EXTERNAL TABLE MyExternalTable (
Column1 INT,
Column2 NVARCHAR(50),
Column3 DATETIME
)
WITH (
LOCATION = ‘/path/to/data.csv’,
DATA_SOURCE = MyExternalDataSource,
FILE_FORMAT = MyFileFormat — You need to define a file format
);
Query the External Data
You can also directly query the data without creating an external table:
SELECT *
FROM OPENROWSET(
BULK ‘/path/to/data.csv’,
DATA_SOURCE = ‘MyExternalDataSource’,
FORMAT = ‘CSV’,
FIELDTERMINATOR = ‘,’,
ROWTERMINATOR = ‘\n’
) AS MyData;
Create and Use a File Format (Optional)
If you are querying structured files (like CSV, Parquet), you might need to define a file format:
CREATE EXTERNAL FILE FORMAT MyFileFormat
WITH (
FORMAT_TYPE = DELIMITEDTEXT,
FORMAT_OPTIONS (FIELD_TERMINATOR = ‘,’, STRING_DELIMITER = ‘”‘)
);
Summary
By following these steps, you should be able to connect to and query your external data sources using the serverless SQL pool in Synapse. Let me know if you need further assistance!
- Create an external data source in Synapse serverless SQL to point to your external storage.
- Create a database scoped credential if necessary to access your storage.
- Create an external table or directly query data using
OPENROWSET
. - Define a file format if working with structured data like CSV or Parquet.
Please do not hesitate to contact me if you have any questions at William . chen @ mainri.ca
(remove all space from the email account 😊)