Comparison including the Databricks Catalog (Unity Catalog) alongside Mounting Data, External Data Source, External Table and Metastore:
Feature | Databricks Catalog (Unity Catalog) | Mounting Data | External Data Source | External Table | Metastore (Hive Metastore) |
---|---|---|---|---|---|
Purpose | Centralized governance and access control for data across multiple workspaces and environments. | Map cloud storage to DBFS | Query external databases directly | Query external data in cloud storage via SQL | Store metadata (schemas, table locations) for databases and tables in Databricks and Spark. |
Data Access | SQL-based access to tables, views, and databases with unified governance. | File-level access (Parquet, CSV, etc.) | Database-level access (via JDBC/ODBC) | Table-level access with metadata in Databricks | Provides table and schema information to Spark SQL, Hive, and Databricks. |
Setup | Define catalog, databases, tables, views, and enforce permissions centrally. | Mount external storage in DBFS | Configure connector (JDBC/ODBC) | Create an external table with storage location | Automatically manages metadata for tables and databases; can be customized or integrated with external metastores. |
Governance | Centralized governance, RBAC, column-level security, and audit logs. | Managed by storage provider | Managed by the external database | Managed by external storage permissions | Basic governance, mainly for schema management; limited fine-grained access control. |
Pros | Centralized access control, auditing, lineage, and security across multiple environments. | Easy access to files | No need to copy data, works with SQL queries | Allows SQL queries on external data | Simplifies metadata management for large datasets and integrates seamlessly with Spark and Databricks. |
Cons | Requires Unity Catalog setup, and governance policies must be defined for all data assets. | No built-in governance | Latency issues with external databases | Metadata management requires setup | Lacks advanced governance features like RBAC, auditing, and data lineage. |
When to Use | When you need centralized governance, access control, auditing, and security for data assets across multiple workspaces or cloud environments. | When you need direct access to files stored externally, without ingestion. | When you want to query external databases without moving the data. | When you want SQL-based access to external files without copying them into Databricks. | When you need basic schema and metadata management for tables and databases used by Databricks or Spark. |