As Delta Lake tables grow, controlling how much data each query scans becomes the dominant performance factor. Traditionally, teams used partitioning or Z-ORDER to shape data layout, but changing partition keys required rebuilding the table, and Z-ORDER re-sorted the entire table every time it ran.
Liquid Clustering is Delta Lake's next-generation data layout optimization, introduced in DBR 13.3 LTS (GA in late 2023). With three defining features — no partitions, dynamic clustering key changes, and incremental re-clustering — it fundamentally removes the constraints of the older approaches.
Liquid Clustering fundamentally differs from older approaches in these three ways.
Liquid Clustering is configured with the CLUSTER BY clause. You can specify it at table creation time or add it to an existing table afterwards.
-- 新規テーブルをLiquid Clusteringで作成
CREATE TABLE catalog.schema.sales (
sale_id BIGINT,
customer_id BIGINT,
region STRING,
sale_date DATE,
amount DECIMAL(18,2)
)
CLUSTER BY (region, sale_date);
-- CTASでLiquid Clusteringテーブルを作成
CREATE TABLE catalog.schema.sales_optimized
CLUSTER BY (customer_id, sale_date)
AS SELECT * FROM catalog.schema.raw_sales;-- クラスタリングキーを変更(メタデータ更新のみ、即時完了)
ALTER TABLE catalog.schema.sales
CLUSTER BY (customer_id, region);
-- クラスタリングを無効化
ALTER TABLE catalog.schema.sales
CLUSTER BY NONE;CLUSTER BY NONE disables clustering. The existing data layout is preserved, but subsequent OPTIMIZE runs will no longer perform clustering.
Here is a side-by-side comparison of the three data layout optimization techniques. Liquid Clustering combines the strengths of both predecessors while eliminating their weaknesses.
| Aspect | Partitioning | Z-ORDER | Liquid Clustering |
|---|---|---|---|
| Cardinality | Low cardinality recommended (tens to hundreds). High cardinality triggers the small files problem. | No constraints | No constraints |
| Key changes | Requires rebuilding the table (directory structure change) | Can change at OPTIMIZE time, but re-sorts the whole table | Instant change with a single ALTER TABLE (metadata-only update) |
| Incremental optimization | Possible at partition granularity | Not supported (sorts the whole table every time) | Processes only unclustered data |
| Need for OPTIMIZE | Only for file compaction | Required (run OPTIMIZE ZORDER BY) | Required, but fast because it is incremental |
| Data skipping | Partition pruning (directory granularity) | File-level min/max statistics | High-precision file-level skipping via Hilbert Curve |
| Recommended use case | Legacy tables; low-cardinality columns such as date or region | Retrofitted optimization on existing tables (environments without Liquid Clustering support) | All new tables; tables where keys may change; high-cardinality filters |
Databricks officially recommends Liquid Clustering for all new tables. Partitioning and Z-ORDER remain available primarily for backward compatibility with existing tables.
Internally, Liquid Clustering projects multi-dimensional data onto one dimension using a Hilbert Curve, a space-filling curve. Compared with the Z-curve (Morton curve) used by Z-ORDER, the Hilbert Curve keeps spatially adjacent data points closer together along the 1D projection, which improves data skipping precision.
┌──────────────────────────────────────────────────────┐
│ Z-Curve (Morton曲線) Hilbert Curve │
│ │
│ 0 ─ 1 4 ─ 5 0 ─ 1 14─ 15 │
│ │ │ │ │ │ │ │ │ │
│ 3 ─ 2 7 ─ 6 3 ─ 2 13─ 12 │
│ │ │ │
│ 12─ 13 8 ─ 9 4 ─ 5 10─ 11 │
│ │ │ │ │ │ │ │ │ │
│ 15─ 14 11─ 10 7 ─ 6 9 ─ 8 │
│ │
│ → 空間的に近いデータが → 空間的に近いデータが │
│ 1次元で離れるケースあり 常に1次元でも近接 │
└──────────────────────────────────────────────────────┘
Hilbert Curveの利点:
- 空間的局所性がZ-Curveより高い
- WHERE region = 'APAC' AND sale_date = '2026-03-01'
のような多次元フィルタでスキッピング精度が向上
- 統計情報(min/max)の範囲が狭くなり、
不要ファイルの除外率が高まるLiquid Clustering attaches metadata to each data file indicating whether it is "clustered" or "unclustered". When OPTIMIZE runs, only the unclustered files (those added by recent INSERT, UPDATE, or MERGE) are eligible for clustering work.
-- 増分再クラスタリングの流れ
-- Step 1: テーブル作成
CREATE TABLE events CLUSTER BY (user_id, event_date) ...;
-- Step 2: データ追加(未クラスタリング状態でファイルが追加)
INSERT INTO events SELECT * FROM raw_events;
-- → 新規ファイルは is_clustered = false
-- Step 3: OPTIMIZE実行
OPTIMIZE events;
-- → is_clustered = false のファイルだけを読み込み
-- → Hilbert Curveで再配置し、新しいファイルとして書き出し
-- → 新ファイルは is_clustered = true
-- → 古いファイルはトランザクションログで無効化
-- Step 4: 追加データのみ再クラスタリング
INSERT INTO events SELECT * FROM new_events;
OPTIMIZE events;
-- → Step 4のINSERTで追加されたファイルだけが対象
-- → Step 3でクラスタリング済みのファイルは処理しないOn tables with Liquid Clustering configured, the OPTIMIZE command behaves differently from a regular Delta table.
| Behavior | Regular Delta table | Liquid Clustering table |
|---|---|---|
| OPTIMIZE | Small file compaction only | Compaction plus Hilbert Curve re-clustering |
| OPTIMIZE ... ZORDER BY | Reorganizes data with Z-ORDER | Error (cannot be combined with CLUSTER BY) |
| Processed scope | Entire table (with Z-ORDER) | Only unclustered files (incremental) |
-- Liquid Clusteringテーブルでの正しいOPTIMIZE
OPTIMIZE catalog.schema.sales;
-- → 未クラスタリングデータだけを増分処理
-- エラーになる例
OPTIMIZE catalog.schema.sales ZORDER BY (region);
-- → Error: CLUSTER BYが設定されたテーブルにZORDER BYは使用できませんWhen Databricks Predictive Optimization is enabled, OPTIMIZE is automatically scheduled for Unity Catalog managed tables. On Liquid Clustering tables, this auto-OPTIMIZE also runs incremental re-clustering at the same time. You no longer need to run OPTIMIZE by hand, dramatically reducing operational overhead.
-- Predictive Optimizationの有効化(Unity Catalogスキーマ単位)
ALTER SCHEMA catalog.schema
SET DBPROPERTIES ('delta.enablePredictiveOptimization' = 'true');
-- テーブル単位での設定
ALTER TABLE catalog.schema.sales
SET TBLPROPERTIES ('delta.enablePredictiveOptimization' = 'true');
-- 有効な場合のLiquid Clusteringテーブルの運用:
-- 1. OPTIMIZE が自動スケジュールされる
-- 2. 増分再クラスタリングが自動実行される
-- 3. VACUUMも自動スケジュールされる
-- → 手動でのメンテナンスコマンド実行が不要にThe migration steps from partitioned tables or Z-ORDER tables to Liquid Clustering differ depending on the table type.
-- Step 1: Liquid Clusteringを有効化
ALTER TABLE catalog.schema.events
CLUSTER BY (user_id, event_date);
-- Step 2: 増分再クラスタリング実行
OPTIMIZE catalog.schema.events;
-- これ以降、OPTIMIZE ZORDER BY は不要(使用するとエラー)
-- 以降のOPTIMIZEで自動的にHilbert Curveベースのクラスタリングが実行You cannot enable Liquid Clustering directly on a partitioned table. You need to create a new table using CTAS (CREATE TABLE AS SELECT) or DEEP CLONE.
-- 方法A: CTAS で新テーブル作成
CREATE TABLE catalog.schema.events_v2
CLUSTER BY (region, event_date)
AS SELECT * FROM catalog.schema.events_partitioned;
-- 方法B: DEEP CLONE + ALTER TABLE
CREATE TABLE catalog.schema.events_v2
DEEP CLONE catalog.schema.events_partitioned;
ALTER TABLE catalog.schema.events_v2
CLUSTER BY (region, event_date);
OPTIMIZE catalog.schema.events_v2;
-- 移行後の検証
SELECT COUNT(*) FROM catalog.schema.events_v2;
DESCRIBE DETAIL catalog.schema.events_v2;
-- clusteringColumns に [region, event_date] が表示されることを確認-- 分析パターンの変化に応じてキーを変更
-- Before: region と sale_date でフィルタが多かった
-- After: customer_id と product_category でフィルタが増えた
ALTER TABLE catalog.schema.sales
CLUSTER BY (customer_id, product_category);
-- 次回のOPTIMIZEで新キーによる増分再クラスタリングが開始
OPTIMIZE catalog.schema.sales;Liquid Clustering is an important topic on both the Data Engineer Associate (DEA) and Spark Developer exams. The differences versus Z-ORDER come up especially often, and you are typically asked to decide which one to use.
Data Engineer Associate
問題 1
A team runs OPTIMIZE ... ZORDER BY (user_id) on the Delta Lake table events every day. The table has grown past 10 TB and OPTIMIZE now takes more than 8 hours. Most queries filter on user_id and event_date. Which is the best way to shorten OPTIMIZE while maintaining or improving query data-skipping?
正解: B
Migrating to Liquid Clustering (CLUSTER BY) makes OPTIMIZE incrementally process only the unclustered data, dramatically shortening runtime compared with Z-ORDER, which re-sorts the entire 10 TB every time. Hilbert Curve-based layout matches or exceeds Z-ORDER's skipping precision on the user_id and event_date multi-dimensional filters. Option A reduces the column count but does not fix Z-ORDER's fundamental whole-table re-sort. Option C triggers the small files problem when event_date has high cardinality, and the rebuild cost is substantial. Option D only tweaks file size — it does not reduce the work itself, so it is not a real fix.
Can Liquid Clustering and Z-ORDER be used together on the same table?
No. Running OPTIMIZE ... ZORDER BY on a table that has Liquid Clustering (CLUSTER BY) configured raises an error. Liquid Clustering is designed as a superset of Z-ORDER, combining Hilbert Curve-based multi-dimensional data layout with file-level data skipping. If your existing tables use Z-ORDER, migrate them with ALTER TABLE ... CLUSTER BY (col1, col2). From that point on, plain OPTIMIZE (without ZORDER BY) performs incremental re-clustering.
What happens to existing data when I change the Liquid Clustering keys?
When you change the clustering keys with ALTER TABLE ... CLUSTER BY (new_col), existing data is not immediately re-clustered. The next OPTIMIZE incrementally re-clusters any unclustered data (data that is not aligned with the new keys). The key change itself is only a metadata update, with no table rewrite. This is the biggest difference from partitioning, where changing keys requires modifying the physical directory structure and rewriting the entire table.
Is Photon Engine required to use Liquid Clustering?
Not required, but strongly recommended. Liquid Clustering's OPTIMIZE is designed to run most efficiently on Photon, and for large tables you can see 2-5x performance differences versus running without Photon. DBR 13.3 LTS or later is required, and we recommend running on Serverless SQL Warehouses or Photon-enabled clusters. Liquid Clustering still works in environments without Photon, but OPTIMIZE may take significantly longer.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Databricks Certifications: All 7 Exams, Difficulty & Study Plan (2026)
Complete guide to all 7 Databricks certifications — Data Eng...
Databricks Exam Difficulty Ranking: All 7 Certs Compared (2026)
Every Databricks certification ranked by difficulty, with st...
Databricks Study Guide: Fastest Pass Route & Time Estimates (2026)
How to pass Databricks certifications efficiently. Official ...
Databricks Data Engineer Associate: Complete Guide (2026)
Domain-by-domain breakdown of the Databricks Certified Data ...
Databricks Data Engineer Professional: Complete Guide (2026)
Tactics for the Databricks Certified Data Engineer Professio...