dbt unique Test: Detect Duplicate Keys (2026)

The unique test is the smallest unit of quality gate for instantly detecting duplicates on primary key candidates and business keys.

This article walks through definition, execution, failure remediation, CI integration, warehouse-specific differences, and the exam-relevant points, all aligned with the official documentation.

Fundamentals of the unique Test and Failure Conditions

The dbt generic test unique verifies that each value in the specified column is unique within the table. Internally, it runs GROUP BY on the target column and fails when any value appears 2 or more times. Because NULL is also aggregated as a single value, 2 or more NULL rows will cause unique to fail. So when you need primary-key-level strictness, combine unique with not_null.

Tests are typically declared in a model's YAML property file. When you run dbt test, dbt generates inspection SQL for each model and applies a simple rule: any returned rows mean failure, zero rows mean success.

Target: uniqueness of a single column
Failure condition: 2 or more identical values, or 2 or more NULL rows
Primary-key equivalent: unique + not_null must be used together
Evaluation logic: any row returned by the inspection query means failure

Test	Purpose	Common failure example	Recommended pairing
unique	Duplicate detection (same value appears 2+ times)	Fan-out from JOIN multiplication	not_null
not_null	Eliminating NULL values	Mistakenly applied to nullable columns	unique
relationships	Referential integrity (foreign-key-like)	Reference target not yet loaded; late arrival	unique + not_null (on the referring key side)

Conceptual diagram of ETL/ELT and where tests sit

Example of combining unique and not_null in YAML

version: 2
models:
  - name: fct_orders
    description: Order fact table (order_id is the primary-key equivalent)
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: customer_id
        tests:
          - relationships:
              to: ref('dim_customers')
              field: customer_id

Setup and Execution

Properties live in YML files under the models directory (for example, models/schema.yml). dbt test supports selectors for narrowing the scope. When you want to inspect only the change-affected area, the standard approach is to combine stateful selection such as state:modified with CI.

To enable instant remediation when unique fails, it helps to keep naming and comments tidy so that model and column names show up clearly in error messages.

Basic: dbt test -s <model or selector>
Column-level: dbt test -s model_name.column_name
Changed only: dbt test -s state:modified+ (commonly used in CI)
Everything: dbt test (all declared tests on every model and column)

Command	Scope	Use case
dbt test -s fct_orders	Per model	Inspect only the target model
dbt test -s fct_orders.order_id	Per column	Spot-check a problem column
dbt test -s state:modified+	Changes plus downstream	Diff-based checks in PRs

Conceptual view of narrowing tests with selectors

CLI example (same for local and CI)

# Single model
$ dbt test -s fct_orders

# Specify a column directly
$ dbt test -s fct_orders.order_id

# Changes and everything downstream
$ dbt test -s state:modified+

Root-Cause Analysis on Failure and Deduplication Patterns

Most unique failures stem from JOIN fan-out (multiplication), ingesting duplicate rows, or re-loading late-arriving data. Start by inspecting the failing test's generated SQL — not via dbt debug, but from the compiled output (target/compiled) or logs — and observe row counts per GROUP BY value.

The classic deduplication move is to keep only the latest record per key via a window function. For incremental models, combine the unique_key config with designs that prevent duplicates from being created in the first place (source-side normalization and clear key definitions).

Identify root cause by reading the inspection SQL (count and group by)
Suppress JOIN fan-out with pre-aggregation or DISTINCT
Pick the latest record with row_number() over(partition by ... order by updated_at desc)
For incremental models, combine unique_key with upsert semantics to prevent duplicates

Symptom	Observation point	Remediation
Row count multiplies after JOIN	One-to-many on the join key	Pre-aggregate, DISTINCT, or pick a representative row
The same record re-appears on ingestion	Same key with different update timestamps	Adopt only the latest record via row_number
Frequent NULL values	2 or more NULL rows	Add not_null and fix the upstream

Deduplication flow (example)

Deduplication SQL example (keep the most recent row)

with src as (
  select * from {{ ref('stg_orders') }}
), ranked as (
  select
    *,
    row_number() over(
      partition by order_id
      order by updated_at desc
    ) as rn
  from src
)
select
  -- Required columns
  order_id, customer_id, amount, updated_at
from ranked
where rn = 1

CI/CD and Treating Tests as Quality Gates

It is common to operate unique as a gate that controls whether changes are promoted to production. Use severity: error for columns where you want the pipeline to halt on failure, and severity: warn for situations where failures should be tolerated (such as during a migration). The default severity is error.

Inspecting only the change diff lets you keep CI runtime down while maintaining quality. Rich test names and descriptions also speed up decision-making when failures occur.

Use severity: error for strict key columns
Use severity: warn for migrations or staged rollouts
Speed up runs via state selection
On PRs, run dbt test after dbt run and treat it as a gate

severity	Pipeline behavior	Main use cases
error (default)	Halts on failure	Primary keys, foreign keys, billing-impacting aggregates
warn	Logs a warning and continues	Interim quality during migrations and staged remediation

Conceptual run order in CI

Setting severity per column in YAML

version: 2
models:
  - name: fct_orders
    columns:
      - name: order_id
        tests:
          - unique:
              config:
                severity: error
          - not_null
      - name: legacy_order_code
        description: May contain duplicates during the migration window
        tests:
          - unique:
              config:
                severity: warn

Per-Warehouse Caveats (NULL, Case Sensitivity, Collation)

dbt's unique relies on GROUP BY/HAVING in the generated SQL. The behavior of failing on 2 or more NULLs and passing with only 1 NULL is consistent across the major DWHs. However, what counts as a duplicate can vary depending on string case sensitivity and collation.

When you need case-insensitive uniqueness, apply unique to a derived column normalized with lower or upper, or implement a custom generic test.

2 or more NULLs cause unique to fail (consistent across major DWHs)
Case sensitivity and collation depend on the DWH configuration
Inspect a normalized column when needed
For composite uniqueness, use custom tests or packages

DWH	Default case sensitivity	Example workaround
Snowflake	BINARY (case-sensitive in general)	Inspect a lower() column
BigQuery	Case-sensitive (default)	Normalized column or explicit collation (function-dependent)
Databricks SQL	Follows Spark SQL (case-sensitive by default)	Inspect a normalized column

Flow for inspecting via a normalized column

Custom generic test (case-insensitive uniqueness)

{% macro test_unique_lower(model, column_name) %}
with base as (
  select lower({{ column_name }}) as val from {{ model }}
), agg as (
  select val, count(*) as n from base group by 1 having count(*) > 1
)
select * from agg
{% endmacro %}

# Invocation (YAML)
# tests:
#   - unique_lower:
#       column_name: user_email

Exam Prep Checkpoints (Analytics Engineer)

Common exam topics include the meaning of the unique test, its relationship with NULL, the unique + not_null combination, and selector-based test execution. Also remember to distinguish between unique_key for incremental models (a runtime merge key) and the unique test (a quality check).

Choosing between relationships tests and accepted_values tests, and the operational call between warn and error, are also frequently tested.

Primary-key equivalent = unique + not_null
1 NULL is OK, 2 or more NULLs is not (a property of unique)
Understand what dbt test -s state:modified+ means
unique_key is for incremental upserts; it is not a test

Topic	Importance	Phrase to remember
Relationship between unique and not_null	High	Primary key = no duplicates and no NULLs
Selector (state:modified+)	High	Test the diff and the downstream
Choosing the right severity	Medium	Gate with error, observe with warn

A mental map of the topics

Cheat sheet (YAML excerpt)

version: 2
models:
  - name: some_model
    columns:
      - name: business_key
        tests:
          - unique
          - not_null
          - relationships:
              to: ref('dim_ref')
              field: business_key

Check Your Understanding

Analytics Engineer

問題 1

Which statement about the behavior of the dbt generic test unique is correct, including the recommendation for primary-key-level strictness?

It fails on 2 or more identical values and also fails on 2 or more NULLs. For a primary-key equivalent, combine it with not_null.
It fails on 2 or more identical values, but NULLs are ignored regardless of how many there are. unique alone is enough for a primary-key equivalent.
It only fails when the same value appears 3 or more times, and NULL is never allowed.
unique is syntactic sugar for relationships and only validates referential integrity.

正解: A

unique runs GROUP BY/HAVING on the target column and fails on 2 or more identical values (NULL is aggregated as a single value, so 2 or more NULLs counts). For primary-key-level strictness, you must combine unique with not_null.

Frequently Asked Questions

Which should I add first: unique or not_null?

Order does not matter, but you should always define both on primary-key-like columns. unique alone does not fail when a single NULL exists, so pair it with not_null to forbid NULL values as well.

How do I test composite uniqueness (a combination of multiple columns)?

The built-in unique test targets a single column. For composite uniqueness, implement a custom generic test or use a community package such as dbt_utils.unique_combination_of_columns.

What is the difference between an incremental model's unique_key config and the unique test?

unique_key is the merge key (upsert key) used at runtime during incremental builds to prevent duplicates from being created. The unique test is a quality check that validates the uniqueness of the produced table. In practice, you typically use both together.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

dbt unique Test: The Fundamentals of Guaranteeing Uniqueness