Databricks

What Is Unity Catalog? Complete Guide to Databricks Data Governance

2026-03-21
更新: 2026-03-27
NicheeLab Editorial Team

Unity Catalog is the account-level unified data governance layer provided by Databricks. It delivers centralized access control, audit logging, and data lineage across multiple workspaces for every data asset, including tables, views, functions, ML models, and files (Volumes). It unifies metadata management that used to be fragmented per workspace in the legacy Hive Metastore by introducing a three-level namespace of catalog.schema.table, and enables SQL-standard GRANT/REVOKE-based permission management.

This article covers Unity Catalog end-to-end: hierarchy, permission model, External Locations, Delta Sharing integration, and how it differs from Hive Metastore. It also highlights the points tested on the Data Engineer Associate (DEA) exam, focusing on the Data Governance domain (17% of the exam).

Unity Catalog Hierarchy

Unity Catalog uses a four-level object model topped by the metastore. Every data asset sits in this tree, and permissions are inherited from parents down to children.

Metastore (one per account, per region)
 ├── Catalog: production
 │    ├── Schema: sales
 │    │    ├── Table: orders
 │    │    ├── Table: customers
 │    │    ├── View: daily_summary
 │    │    └── Function: calc_tax()
 │    ├── Schema: marketing
 │    │    ├── Table: campaigns
 │    │    └── Volume: raw_files
 │    └── Schema: information_schema (auto-generated)
 ├── Catalog: development
 │    └── Schema: sandbox
 │         └── Table: test_orders
 ├── External Location: s3://data-lake/external/
 └── Storage Credential: aws_s3_role

When you reference a data asset, always use the three-level namespace catalog.schema.object. This makes it possible to uniquely identify which environment, which domain, and which table is being referenced just from the name.

-- Access via the three-level namespace
SELECT * FROM production.sales.orders;
--            ^^^^^^^^^^  ^^^^^  ^^^^^^
--            Catalog     Schema Table

-- Set a default Catalog/Schema
USE CATALOG production;
USE SCHEMA sales;

-- After the settings above, you can use short names
SELECT * FROM orders;

Role of Each Level and Design Patterns

LevelDescriptionExample Design Pattern
MetastoreThe top-level container for Unity Catalog. Create one per region and attach multiple workspaces to itOne created for ap-northeast-1 (Tokyo)
CatalogTop-level logical grouping of data assets. Separated by environment, department, or projectproduction / development / staging, finance / marketing
SchemaGrouping for tables and views. Organized by domain or data layersales / hr / logs, bronze / silver / gold
Table / View / Function / VolumeThe actual data assets. Tables are either Managed or Externalorders (Managed), external_logs (External), calc_tax() (UDF)

Creating Catalogs and Schemas

To store data assets in Unity Catalog, you first create a Catalog and a Schema. Catalogs can be created by the Metastore Admin (or any user holding the CREATE CATALOG privilege), and Schemas can be created by the Catalog owner (or any user holding the CREATE SCHEMA privilege).

-- Create a Catalog
CREATE CATALOG IF NOT EXISTS production
COMMENT 'Production environment catalog';

-- Create a Schema
CREATE SCHEMA IF NOT EXISTS production.sales
COMMENT 'Sales domain tables';

-- Create a Managed Table (no LOCATION -> stored in Unity Catalog-managed storage)
CREATE TABLE production.sales.orders (
  order_id     BIGINT       GENERATED ALWAYS AS IDENTITY,
  customer_id  BIGINT       NOT NULL,
  amount       DECIMAL(12,2) NOT NULL,
  order_date   DATE         NOT NULL,
  status       STRING       DEFAULT 'pending'
)
COMMENT 'Customer order records'
TBLPROPERTIES ('quality' = 'gold');

-- Create an External Table (with LOCATION -> references data in external storage)
CREATE TABLE production.sales.external_logs (
  log_id    BIGINT,
  message   STRING,
  timestamp TIMESTAMP
)
LOCATION 's3://data-lake/external/sales/logs/';

Permission Management (GRANT / REVOKE)

Unity Catalog enforces access control with SQL-standard GRANT/REVOKE syntax. Privileges are granted on Securable Objects (Catalog, Schema, Table, View, Function, Volume, External Location, Storage Credential) and attached to users or groups (principals).

Main Privilege Types

PrivilegeApplies ToEffect
USAGECatalog, SchemaRequired to list contents of and access objects nested inside
SELECTTable, ViewRead data (run SELECT statements)
MODIFYTableRun INSERT / UPDATE / DELETE / MERGE
CREATE TABLESchemaCreate new tables in the Schema
CREATE SCHEMACatalogCreate new Schemas in the Catalog
CREATE CATALOGMetastoreCreate new Catalogs in the metastore
ALL PRIVILEGESAllGrants every privilege on the target object at once
CREATE EXTERNAL LOCATIONStorage CredentialCreate an External Location using a Storage Credential

SQL Examples for Granting Privileges

-- Step 1: Open access to the Catalog (open the hallway)
GRANT USAGE ON CATALOG production TO analysts;

-- Step 2: Open USAGE on the Schema
GRANT USAGE ON SCHEMA production.sales TO analysts;

-- Step 3: Grant SELECT on tables (let them into the room)
GRANT SELECT ON SCHEMA production.sales TO analysts;
-- ^ SELECT applies to every table under the Schema

-- Restrict to a specific table
GRANT SELECT ON TABLE production.sales.orders TO analysts;

-- Grant data modification privilege
GRANT MODIFY ON TABLE production.sales.orders TO etl_service;

-- Privilege to create tables in a Schema
GRANT CREATE TABLE ON SCHEMA production.sales TO data_engineers;

-- Inspect privileges
SHOW GRANTS ON SCHEMA production.sales;
SHOW GRANTS TO analysts;

-- Revoke a privilege
REVOKE SELECT ON SCHEMA production.sales FROM analysts;

Privilege Inheritance Model

Unity Catalog privileges flow from parent objects down to children. For example, granting SELECT on a Catalog applies SELECT to every Schema and table under that Catalog. However, USAGE is not inherited automatically. Unless you explicitly grant USAGE on both the parent Catalog and the Schema, having SELECT on the underlying table is not enough to access the data.

Privilege inheritance:

GRANT SELECT ON CATALOG production TO analysts;
 -> SELECT inherited by every Schema and Table under production

But USAGE is still required:
 Catalog: production  -> USAGE required ✓
   Schema: sales      -> USAGE required ✓
     Table: orders    -> readable with SELECT ✓

When USAGE is missing:
 Catalog: production  -> no USAGE ✗
   Schema: sales      -> USAGE granted
     Table: orders    -> SELECT granted -> but inaccessible ✗

Managed Tables vs. External Tables

Tables registered in Unity Catalog are categorized as Managed Tables or External Tables based on where the data is stored. The DROP behavior is fundamentally different, so the choice matters at design time.

ComparisonManaged TableExternal Table
Data locationUnity Catalog managed storage (the storage root of the metastore/catalog/schema)An external path you specify (S3, ADLS, GCS)
How to createCREATE TABLE ... (no LOCATION)CREATE TABLE ... LOCATION 's3://...'
Data on DROPBoth metadata and data files are deletedOnly metadata is deleted; data files remain in external storage
Lifecycle managementFully managed by Unity CatalogYou manage the lifecycle of the data files
Recommended use casesData that lives entirely inside Databricks; greenfield projectsIntegration with existing data lakes; sharing data with other platforms
Exam favorite: "What happens to the data files after DROP TABLE?" -> Managed Table deletes the data too; External Table only deletes the metadata.

External Locations and Storage Credentials

To create an External Table, you must register the credentials Unity Catalog uses to access external storage. The mechanism has two layers: Storage Credential and External Location.

[Storage Credential]          Register cloud credentials
        |                     (IAM role / Service Principal / Service Account)
        v
[External Location]           Bind a credential to an allowed path
  url: s3://bucket/path/      "With these credentials, access to this path is OK"
        |
        v
[External Table]              Create a table with LOCATION pointing at the external path
  production.sales.ext_logs
  LOCATION 's3://bucket/path/logs/'

Setup Flow (SQL)

-- 1. Create a Storage Credential (requires Metastore Admin)
CREATE STORAGE CREDENTIAL aws_s3_credential
WITH (
  AWS_IAM_ROLE = 'arn:aws:iam::123456789012:role/unity-catalog-role'
);

-- 2. Create an External Location
CREATE EXTERNAL LOCATION s3_data_lake
URL 's3://my-data-lake/production/'
WITH (STORAGE CREDENTIAL aws_s3_credential)
COMMENT 'Production data lake on S3';

-- 3. Grant privileges on the External Location
GRANT CREATE EXTERNAL TABLE ON EXTERNAL LOCATION s3_data_lake
TO data_engineers;

-- 4. Create the External Table
CREATE TABLE production.sales.external_events (
  event_id   BIGINT,
  event_type STRING,
  payload    STRING,
  created_at TIMESTAMP
)
LOCATION 's3://my-data-lake/production/events/';

A Storage Credential can be reused across multiple External Locations. For example, a single IAM role can back two External Locations — s3://bucket/sales/ and s3://bucket/marketing/ — so different teams get distinct, well-bounded access ranges.

Comparison with Hive Metastore

Unity Catalog was designed to solve the governance pain points of the legacy Hive Metastore. If you are planning a migration, you need to understand the following differences.

ComparisonHive MetastoreUnity Catalog
Management scopePer-workspace (each workspace has its own isolated metastore)Per-account (shared across multiple workspaces)
NamespaceTwo-level (schema.table / database.table)Three-level (catalog.schema.table)
Permission modelTable ACLs (access control lists per table/view)SQL-standard GRANT/REVOKE with hierarchical inheritance
Audit logsRelies on cluster logs (hard to audit uniformly)Unified audit logs via System Tables
Data lineageManual (no built-in feature)Automatic table- and column-level lineage
Table formatsDelta / Parquet / CSV / JSON / ORC / AvroManaged Tables are Delta only; External Tables support multiple formats
File managementDBFS (no governance)Volumes (with permission management and auditing)
Cross-workspace sharingNot possible (metadata fragmented across workspaces)Automatically shared across workspaces attached to the same metastore

Unity Catalog and Delta Sharing

Delta Sharing is an open protocol for securely sharing data across organizational boundaries. Unity Catalog acts as a Delta Sharing provider and can share data with other organizations' Databricks environments as well as non-Databricks environments (Spark, Pandas, Power BI, Tableau, and more).

-- Create a Share
CREATE SHARE customer_analytics;

-- Add a table to the Share
ALTER SHARE customer_analytics
ADD TABLE production.sales.orders;

-- Create a Recipient
CREATE RECIPIENT partner_company
USING ID 'partner-sharing-identifier';

-- Grant the Recipient access to the Share
GRANT SELECT ON SHARE customer_analytics TO RECIPIENT partner_company;
  • No data copy is created (the recipient reads directly from the provider's storage)
  • Sharing scope can be controlled at the table or partition level
  • Recipients do not need to be Databricks users (open protocol)
  • Access logs for shared data are recorded in the provider's audit logs

Volumes (File Governance)

Volumes let you govern non-table files (CSV, JSON, images, model artifacts, configuration files, and so on) under the Unity Catalog permission model. The legacy DBFS (Databricks File System) had no governance and could not track who accessed which file. Because Volumes are created under a Catalog/Schema, you can control privileges with GRANT/REVOKE and access is recorded in the audit logs.

-- Create a Managed Volume (stored in Unity Catalog-managed storage)
CREATE VOLUME production.raw.landing_files;

-- Create an External Volume (references an external path)
CREATE EXTERNAL VOLUME production.raw.s3_landing
LOCATION 's3://my-bucket/landing/';

-- List files
LIST '/Volumes/production/raw/landing_files/';

-- Read files from SQL
SELECT * FROM csv.`/Volumes/production/raw/landing_files/2026-03/data.csv`;

-- Read files in a Volume from Python
-- df = spark.read.csv("/Volumes/production/raw/landing_files/2026-03/data.csv")

Data Lineage and Auditing

Unity Catalog automatically parses the Spark queries run in notebooks and jobs, and records the data flow (lineage) between tables and between columns. No manual setup or activation is required — it works automatically wherever Unity Catalog is enabled.

  • Table-level lineage: Tracks that data from table A flowed into table B
  • Column-level lineage: Tracks which source columns a given column was derived from
  • Impact analysis: Check up-front which downstream tables will be affected if you change a table's schema
  • Compliance: Track how PII data propagated across tables to support regulations such as GDPR

Audit logs are written to the system.access.audit table (System Tables). You can query who did what to which object and when directly in SQL.

-- Audit events in the last 24 hours
SELECT
  event_time,
  user_identity.email AS user_email,
  action_name,
  request_params.full_name_arg AS object_name
FROM system.access.audit
WHERE event_time > current_timestamp() - INTERVAL 24 HOURS
  AND action_name IN ('getTable', 'createTable', 'grantPermission')
ORDER BY event_time DESC;

Key Points Tested on the Exam (DEA 17%)

On the Data Engineer Associate (DEA) exam, the Data Governance domain accounts for about 17% of the questions. Unity Catalog is the core topic in that domain. Make sure you nail the items below.

TopicWhat to Remember
Three-level namespacecatalog.schema.table structure. The metastore is not part of the namespace
USAGE privilegeSELECT does not work without USAGE on both the Catalog and the Schema. Parent is the hallway, child is the room
Managed vs ExternalDROP behavior: Managed -> data is deleted, External -> only metadata is deleted. Distinguished by the presence of a LOCATION clause
Storage Credential → External LocationTwo-layer authentication. Credential = cloud credentials, Location = allowed path range
VolumesSuccessor to DBFS. GRANT/REVOKE and audit logs apply to files too
Data lineageNo activation needed (recorded automatically). Two flavors: table-level and column-level
Delta SharingOpen protocol. No data copy required. Recipients do not have to be Databricks users
Privilege inheritancePrivileges granted on a Catalog are inherited by every Schema and Table below it (USAGE excluded)

Sample Questions

Data Governance / Unity Catalog

問題 1

The analyst team (group name: analysts) tried to SELECT from production.sales.orders but got a 'PERMISSION DENIED' error. The administrator has already run GRANT SELECT ON SCHEMA production.sales TO analysts. Which additional SQL statement is required to resolve the issue?

  1. GRANT USAGE ON CATALOG production TO analysts
  2. GRANT ALL PRIVILEGES ON TABLE production.sales.orders TO analysts
  3. GRANT READ ON SCHEMA production.sales TO analysts
  4. GRANT BROWSE ON CATALOG production TO analysts

正解: A

In the Unity Catalog permission model, you need USAGE on both the Catalog and the Schema to access objects under them. Granting SELECT on the Schema is not enough; without USAGE on the parent Catalog, you cannot "walk down the hallway" to the data. You need to add GRANT USAGE ON CATALOG production TO analysts, plus USAGE on the production.sales schema. (Because SELECT ON SCHEMA has already been granted, schema-level access can be implicitly covered in some setups, but USAGE on the Catalog is still required.) Option B grants too much privilege and violates least privilege. Options C (READ) and D (BROWSE) are not valid Unity Catalog privilege names.

Data Governance / External Location

問題 2

A data engineer wants to create an External Table that references existing data on S3. What is the correct order of steps in Unity Catalog?

  1. Create External Table -> Create External Location -> Create Storage Credential
  2. Create Storage Credential -> Create External Table -> Create External Location
  3. Create Storage Credential -> Create External Location -> Create External Table
  4. Create External Location -> Create Storage Credential -> Create External Table

正解: C

Creating an External Table requires a Storage Credential and an External Location to exist first. The correct order is (1) Storage Credential (register cloud credentials), (2) External Location (define the allowed path range using that credential), (3) External Table (use a LOCATION clause to create the table). The External Location references the Storage Credential, so the Credential must exist first. The External Table's LOCATION must fall inside the External Location's URL range, so the Location must exist before the Table.

Data Governance / Table Types

問題 3

What happens to the data files when you run DROP TABLE production.sales.orders on a Unity Catalog Managed Table?

  1. Both metadata and data files are deleted
  2. Only the metadata is deleted; the data files are retained for 30 days and then auto-deleted
  3. Only the metadata is deleted; the data files remain in external storage
  4. Both metadata and data files are deleted, but you can restore them with UNDROP TABLE

正解: A

Because Unity Catalog fully manages the data lifecycle of a Managed Table, DROP TABLE deletes both the metadata and the data files. For an External Table, by contrast, only the metadata is deleted and the data files remain in external storage. Option B's "30-day retention" is not a real mechanism. Option C describes the External Table behavior. Option D's UNDROP TABLE does exist in Unity Catalog, but it is not that the data files themselves "remain" — Unity Catalog's internal mechanism simply allows restoration for a limited window. On the exam, make sure you can instantly answer the Managed vs. External difference in terms of whether the data files get deleted.

Try More Unity Catalog Questions

Test your level with 16,000+ questions, including Data Governance items

Try free questions

Frequently Asked Questions

What is the difference between Unity Catalog and Hive Metastore?

Hive Metastore manages metadata per workspace and cannot enforce permissions or auditing across workspaces. Unity Catalog operates at the account level and provides unified access control, audit logs, and data lineage across multiple workspaces. The permission model also differs: Hive Metastore only offers table/view-level ACLs, while Unity Catalog uses SQL-standard GRANT/REVOKE for consistent control across the entire Catalog/Schema/Table/View/Function/Volume hierarchy.

Is Unity Catalog free to use?

The core Unity Catalog features (three-level namespace, GRANT/REVOKE permissions, Managed/External Tables, Volumes) are available on every Databricks edition, including Standard. However, advanced features such as Attribute-Based Access Control (ABAC), column-level automatic lineage, and Lakehouse Monitoring integration require Premium or higher. Check the official Databricks documentation for the latest edition-by-edition feature matrix.

What is the difference between USAGE and SELECT in Unity Catalog?

USAGE is the right to traverse into an object, granted on Catalogs and Schemas. SELECT is the right to read data from a table or view. For example, to SELECT from production.sales.orders, you need SELECT on the table plus USAGE on the production catalog and USAGE on the sales schema. Think of USAGE as the key to the hallway and SELECT as the key to the room. You cannot access the data with just one of them.

Related Unity Catalog Articles

Data Engineer Associate: Complete Guide

Unity Catalog is tested in the Governance domain (17%)

Delta Lake Complete Guide

Unity Catalog Managed Tables use the Delta format

Delta Sharing Explained

Securely share data from Unity Catalog

Databricks SQL Complete Guide

GRANT/REVOKE syntax and the query execution environment

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
Databricks

Databricks Certifications: All 7 Exams, Difficulty & Study Plan (2026)

Complete guide to all 7 Databricks certifications — Data Eng...

Databricks

Databricks Exam Difficulty Ranking: All 7 Certs Compared (2026)

Every Databricks certification ranked by difficulty, with st...

Databricks

Databricks Study Guide: Fastest Pass Route & Time Estimates (2026)

How to pass Databricks certifications efficiently. Official ...

Databricks

Databricks Data Engineer Associate: Complete Guide (2026)

Domain-by-domain breakdown of the Databricks Certified Data ...

Databricks

Databricks Data Engineer Professional: Complete Guide (2026)

Tactics for the Databricks Certified Data Engineer Professio...

Browse all Databricks articles (110)
© 2026 NicheeLab All rights reserved.