The unique test is the smallest unit of quality gate for instantly detecting duplicates on primary key candidates and business keys.
This article walks through definition, execution, failure remediation, CI integration, warehouse-specific differences, and the exam-relevant points, all aligned with the official documentation.
The dbt generic test unique verifies that each value in the specified column is unique within the table. Internally, it runs GROUP BY on the target column and fails when any value appears 2 or more times. Because NULL is also aggregated as a single value, 2 or more NULL rows will cause unique to fail. So when you need primary-key-level strictness, combine unique with not_null.
Tests are typically declared in a model's YAML property file. When you run dbt test, dbt generates inspection SQL for each model and applies a simple rule: any returned rows mean failure, zero rows mean success.
| Test | Purpose | Common failure example | Recommended pairing |
|---|---|---|---|
| unique | Duplicate detection (same value appears 2+ times) | Fan-out from JOIN multiplication | not_null |
| not_null | Eliminating NULL values | Mistakenly applied to nullable columns | unique |
| relationships | Referential integrity (foreign-key-like) | Reference target not yet loaded; late arrival | unique + not_null (on the referring key side) |
Conceptual diagram of ETL/ELT and where tests sit
Example of combining unique and not_null in YAML
version: 2
models:
- name: fct_orders
description: Order fact table (order_id is the primary-key equivalent)
columns:
- name: order_id
tests:
- unique
- not_null
- name: customer_id
tests:
- relationships:
to: ref('dim_customers')
field: customer_id
Properties live in YML files under the models directory (for example, models/schema.yml). dbt test supports selectors for narrowing the scope. When you want to inspect only the change-affected area, the standard approach is to combine stateful selection such as state:modified with CI.
To enable instant remediation when unique fails, it helps to keep naming and comments tidy so that model and column names show up clearly in error messages.
| Command | Scope | Use case |
|---|---|---|
| dbt test -s fct_orders | Per model | Inspect only the target model |
| dbt test -s fct_orders.order_id | Per column | Spot-check a problem column |
| dbt test -s state:modified+ | Changes plus downstream | Diff-based checks in PRs |
Conceptual view of narrowing tests with selectors
CLI example (same for local and CI)
# Single model
$ dbt test -s fct_orders
# Specify a column directly
$ dbt test -s fct_orders.order_id
# Changes and everything downstream
$ dbt test -s state:modified+
Most unique failures stem from JOIN fan-out (multiplication), ingesting duplicate rows, or re-loading late-arriving data. Start by inspecting the failing test's generated SQL — not via dbt debug, but from the compiled output (target/compiled) or logs — and observe row counts per GROUP BY value.
The classic deduplication move is to keep only the latest record per key via a window function. For incremental models, combine the unique_key config with designs that prevent duplicates from being created in the first place (source-side normalization and clear key definitions).
| Symptom | Observation point | Remediation |
|---|---|---|
| Row count multiplies after JOIN | One-to-many on the join key | Pre-aggregate, DISTINCT, or pick a representative row |
| The same record re-appears on ingestion | Same key with different update timestamps | Adopt only the latest record via row_number |
| Frequent NULL values | 2 or more NULL rows | Add not_null and fix the upstream |
Deduplication flow (example)
Deduplication SQL example (keep the most recent row)
with src as (
select * from {{ ref('stg_orders') }}
), ranked as (
select
*,
row_number() over(
partition by order_id
order by updated_at desc
) as rn
from src
)
select
-- Required columns
order_id, customer_id, amount, updated_at
from ranked
where rn = 1
It is common to operate unique as a gate that controls whether changes are promoted to production. Use severity: error for columns where you want the pipeline to halt on failure, and severity: warn for situations where failures should be tolerated (such as during a migration). The default severity is error.
Inspecting only the change diff lets you keep CI runtime down while maintaining quality. Rich test names and descriptions also speed up decision-making when failures occur.
| severity | Pipeline behavior | Main use cases |
|---|---|---|
| error (default) | Halts on failure | Primary keys, foreign keys, billing-impacting aggregates |
| warn | Logs a warning and continues | Interim quality during migrations and staged remediation |
Conceptual run order in CI
Setting severity per column in YAML
version: 2
models:
- name: fct_orders
columns:
- name: order_id
tests:
- unique:
config:
severity: error
- not_null
- name: legacy_order_code
description: May contain duplicates during the migration window
tests:
- unique:
config:
severity: warn
dbt's unique relies on GROUP BY/HAVING in the generated SQL. The behavior of failing on 2 or more NULLs and passing with only 1 NULL is consistent across the major DWHs. However, what counts as a duplicate can vary depending on string case sensitivity and collation.
When you need case-insensitive uniqueness, apply unique to a derived column normalized with lower or upper, or implement a custom generic test.
| DWH | Default case sensitivity | Example workaround |
|---|---|---|
| Snowflake | BINARY (case-sensitive in general) | Inspect a lower() column |
| BigQuery | Case-sensitive (default) | Normalized column or explicit collation (function-dependent) |
| Databricks SQL | Follows Spark SQL (case-sensitive by default) | Inspect a normalized column |
Flow for inspecting via a normalized column
Custom generic test (case-insensitive uniqueness)
{% macro test_unique_lower(model, column_name) %}
with base as (
select lower({{ column_name }}) as val from {{ model }}
), agg as (
select val, count(*) as n from base group by 1 having count(*) > 1
)
select * from agg
{% endmacro %}
# Invocation (YAML)
# tests:
# - unique_lower:
# column_name: user_email
Common exam topics include the meaning of the unique test, its relationship with NULL, the unique + not_null combination, and selector-based test execution. Also remember to distinguish between unique_key for incremental models (a runtime merge key) and the unique test (a quality check).
Choosing between relationships tests and accepted_values tests, and the operational call between warn and error, are also frequently tested.
| Topic | Importance | Phrase to remember |
|---|---|---|
| Relationship between unique and not_null | High | Primary key = no duplicates and no NULLs |
| Selector (state:modified+) | High | Test the diff and the downstream |
| Choosing the right severity | Medium | Gate with error, observe with warn |
A mental map of the topics
Cheat sheet (YAML excerpt)
version: 2
models:
- name: some_model
columns:
- name: business_key
tests:
- unique
- not_null
- relationships:
to: ref('dim_ref')
field: business_key
Analytics Engineer
問題 1
Which statement about the behavior of the dbt generic test unique is correct, including the recommendation for primary-key-level strictness?
正解: A
unique runs GROUP BY/HAVING on the target column and fails on 2 or more identical values (NULL is aggregated as a single value, so 2 or more NULLs counts). For primary-key-level strictness, you must combine unique with not_null.
Which should I add first: unique or not_null?
Order does not matter, but you should always define both on primary-key-like columns. unique alone does not fail when a single NULL exists, so pair it with not_null to forbid NULL values as well.
How do I test composite uniqueness (a combination of multiple columns)?
The built-in unique test targets a single column. For composite uniqueness, implement a custom generic test or use a community package such as dbt_utils.unique_combination_of_columns.
What is the difference between an incremental model's unique_key config and the unique test?
unique_key is the merge key (upsert key) used at runtime during incremental builds to prevent duplicates from being created. The unique test is a quality check that validates the uniqueness of the produced table. In practice, you typically use both together.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...