Clustering Keys are a performance-tuning feature in Snowflake that maximize the accuracy of partition pruning on large tables. By optimizing the physical data layout (the sort order inside micro-partitions) based on the columns you specify, they dramatically reduce the amount of data scanned per query. The feature is available on Enterprise Edition and above.
Snowflake automatically splits every table into compressed micro-partitions of 50-500 MB. Each micro-partition is stored in a columnar format, and metadata such as the min, max, and NULL count of every column is tracked in the Cloud Services Layer.
When a query has a WHERE clause, Snowflake consults this metadata and excludes any micro-partition that cannot possibly match. That mechanism is partition pruning. The more physically sorted the data is, the more precise the pruning becomes and the less data is scanned.
If data is naturally sorted in the order it was inserted — for example, when rows are appended chronologically by a timestamp column — the table is already naturally clustered. In that case, you do not need to define Clustering Keys.
Clustering Depth is a metric that indicates how well-organized the physical data layout of a table is. The smaller the value, the better organized the data and the more efficient the pruning.
-- Check the clustering depth for specific columns
SELECT SYSTEM$CLUSTERING_DEPTH(
'analytics.orders',
'(order_date, region)'
);
-- Retrieve detailed clustering information
SELECT SYSTEM$CLUSTERING_INFORMATION(
'analytics.orders',
'(order_date, region)'
);| Field | Description |
|---|---|
| cluster_by_keys | The Clustering Keys being evaluated |
| total_partition_count | Total number of micro-partitions in the table |
| total_constant_partition_count | Number of partitions where the specified column has only one value (the most prunable state) |
| average_overlaps | Average value-range overlap between partitions (smaller is better) |
| average_depth | Average clustering depth (smaller is better) |
-- Specify Clustering Keys when creating a table
CREATE TABLE analytics.orders (
order_id INT,
order_date DATE,
region VARCHAR(50),
customer_id INT,
amount NUMBER(12,2)
)
CLUSTER BY (order_date, region);
-- Add Clustering Keys to an existing table
ALTER TABLE analytics.orders
CLUSTER BY (order_date, region);
-- Change (overwrite) the Clustering Keys
ALTER TABLE analytics.orders
CLUSTER BY (order_date, customer_id);
-- Drop the Clustering Keys (also stops Automatic Clustering)
ALTER TABLE analytics.orders DROP CLUSTERING KEY;| Criterion | Recommended | Avoid |
|---|---|---|
| Filter frequency | Columns used frequently in WHERE clauses | Columns that are rarely filtered |
| Cardinality | Medium (dates, categories, regions) | Extremely high (UUID) or extremely low (BOOLEAN) |
| Number of columns | 2 to 4 columns | 5 or more (maintenance cost grows quickly) |
| Column order | Place lower-cardinality columns first | Placing higher-cardinality columns first |
Once Clustering Keys are defined, Snowflake's Automatic Clustering service reclusters the data automatically on the Cloud Services Layer. It does not consume your Warehouses and is billed as Serverless Credits instead.
-- Suspend Automatic Clustering
ALTER TABLE analytics.orders SUSPEND RECLUSTER;
-- Resume Automatic Clustering
ALTER TABLE analytics.orders RESUME RECLUSTER;-- Automatic Clustering credit consumption (last 7 days)
SELECT *
FROM TABLE(INFORMATION_SCHEMA.AUTOMATIC_CLUSTERING_HISTORY(
DATE_RANGE_START => DATEADD(DAY, -7, CURRENT_DATE()),
DATE_RANGE_END => CURRENT_DATE(),
TABLE_NAME => 'ANALYTICS.ORDERS'
));
-- Long-term cost trend via the Account Usage view
SELECT
table_name,
SUM(credits_used) AS total_credits,
SUM(num_bytes_reclustered) AS total_bytes_reclustered
FROM SNOWFLAKE.ACCOUNT_USAGE.AUTOMATIC_CLUSTERING_HISTORY
WHERE start_time >= DATEADD(MONTH, -1, CURRENT_TIMESTAMP())
GROUP BY table_name
ORDER BY total_credits DESC;Is the table billions of rows or larger?
├─ NO → Clustering Keys are unnecessary (natural pruning is enough)
└─ YES
↓
Check partition pruning efficiency in the Query Profile
↓
Is the number of scanned partitions large relative to the total?
├─ NO → Natural clustering already works → Clustering Keys unnecessary
└─ YES
↓
Pick 2-4 columns that are filtered frequently in WHERE clauses
↓
Define them with ALTER TABLE ... CLUSTER BY
↓
Monitor cost via AUTOMATIC_CLUSTERING_HISTORYPerformance Optimization
問題 1
On Enterprise Edition, a query against a 5-billion-row transaction table is slow. The Query Profile shows that partition pruning is barely working for the order_date range filter in the WHERE clause. Which is the most appropriate action?
正解: B
Pruning is failing because order_date values are scattered across micro-partitions. Setting CLUSTER BY (order_date) tells Automatic Clustering to physically reorganize the data, which dramatically improves pruning accuracy for date-range filters. Scaling the Warehouse increases raw scan speed but doesn't fix the root cause. Time Travel settings are a data-protection feature and have no effect on query performance.
How much does Automatic Clustering cost once I set Clustering Keys?
Automatic Clustering is billed as Serverless Credits and scales with the frequency and volume of DML operations (INSERT/UPDATE/DELETE/MERGE) on the table. Tables that are updated frequently trigger more reclustering and therefore cost more. Use the AUTOMATIC_CLUSTERING_HISTORY table function to inspect credit consumption; if costs run higher than expected, reduce the number of columns in the Clustering Keys or drop them entirely (ALTER TABLE t DROP CLUSTERING KEY).
What is the difference between Clustering Keys and the Search Optimization Service?
Clustering Keys control the physical data layout to improve partition pruning, and are effective for range filters (BETWEEN, >=, <=) and equality filters. The Search Optimization Service, in contrast, builds a complementary access structure that accelerates equality lookups (= / IN), VARIANT path searches, and GEOGRAPHY function searches. The two features can be used together, and you choose between them based on the workload.
Why shouldn't I pick a very high-cardinality column for Clustering Keys?
If you pick an extremely high-cardinality column such as a UUID or random ID as a Clustering Key, the min/max range within each micro-partition becomes very narrow and partitions barely overlap. That looks appealing on paper, but filters on those columns already prune well naturally because they are so pinpoint, so the improvement does not justify the maintenance cost of Clustering Keys. Medium-cardinality columns such as dates or categories are the most effective choice.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Snowflake Certifications: All 11 Exams Explained (2026)
Every SnowPro certification — Associate, Core, Specialty, Ad...
Snowflake Exam Difficulty Ranking: All 11 Certs Compared (2026)
All 11 SnowPro exams ranked by difficulty with study-time es...
Snowflake Study Guide: Fastest Pass Route by Exam (2026)
How to pass SnowPro certifications efficiently — official ma...
SnowPro Core (COF-C03): Complete Exam Guide (2026)
Pass the SnowPro Core exam — six domains, scope, sample ques...
SnowPro Associate Platform (SOL-C01): Complete Guide (2026)
The entry-level SnowPro Associate exam — scope, weighting, s...