dbt

dbt unique Test: The Fundamentals of Guaranteeing Uniqueness

2026-04-19
NicheeLab Editorial Team

The unique test is the smallest unit of quality gate for instantly detecting duplicates on primary key candidates and business keys.

This article walks through definition, execution, failure remediation, CI integration, warehouse-specific differences, and the exam-relevant points, all aligned with the official documentation.

Fundamentals of the unique Test and Failure Conditions

The dbt generic test unique verifies that each value in the specified column is unique within the table. Internally, it runs GROUP BY on the target column and fails when any value appears 2 or more times. Because NULL is also aggregated as a single value, 2 or more NULL rows will cause unique to fail. So when you need primary-key-level strictness, combine unique with not_null.

Tests are typically declared in a model's YAML property file. When you run dbt test, dbt generates inspection SQL for each model and applies a simple rule: any returned rows mean failure, zero rows mean success.

  • Target: uniqueness of a single column
  • Failure condition: 2 or more identical values, or 2 or more NULL rows
  • Primary-key equivalent: unique + not_null must be used together
  • Evaluation logic: any row returned by the inspection query means failure
TestPurposeCommon failure exampleRecommended pairing
uniqueDuplicate detection (same value appears 2+ times)Fan-out from JOIN multiplicationnot_null
not_nullEliminating NULL valuesMistakenly applied to nullable columnsunique
relationshipsReferential integrity (foreign-key-like)Reference target not yet loaded; late arrivalunique + not_null (on the referring key side)

Conceptual diagram of ETL/ELT and where tests sit

sourcesstagingmarts/fcttestsunique / not_null / relationships

Example of combining unique and not_null in YAML

version: 2
models:
  - name: fct_orders
    description: Order fact table (order_id is the primary-key equivalent)
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: customer_id
        tests:
          - relationships:
              to: ref('dim_customers')
              field: customer_id

Setup and Execution

Properties live in YML files under the models directory (for example, models/schema.yml). dbt test supports selectors for narrowing the scope. When you want to inspect only the change-affected area, the standard approach is to combine stateful selection such as state:modified with CI.

To enable instant remediation when unique fails, it helps to keep naming and comments tidy so that model and column names show up clearly in error messages.

  • Basic: dbt test -s <model or selector>
  • Column-level: dbt test -s model_name.column_name
  • Changed only: dbt test -s state:modified+ (commonly used in CI)
  • Everything: dbt test (all declared tests on every model and column)
CommandScopeUse case
dbt test -s fct_ordersPer modelInspect only the target model
dbt test -s fct_orders.order_idPer columnSpot-check a problem column
dbt test -s state:modified+Changes plus downstreamDiff-based checks in PRs

Conceptual view of narrowing tests with selectors

all modelsstate:modifiedchangedstate:modified+changes and their downstream dependencies

CLI example (same for local and CI)

# Single model
$ dbt test -s fct_orders

# Specify a column directly
$ dbt test -s fct_orders.order_id

# Changes and everything downstream
$ dbt test -s state:modified+

Root-Cause Analysis on Failure and Deduplication Patterns

Most unique failures stem from JOIN fan-out (multiplication), ingesting duplicate rows, or re-loading late-arriving data. Start by inspecting the failing test's generated SQL — not via dbt debug, but from the compiled output (target/compiled) or logs — and observe row counts per GROUP BY value.

The classic deduplication move is to keep only the latest record per key via a window function. For incremental models, combine the unique_key config with designs that prevent duplicates from being created in the first place (source-side normalization and clear key definitions).

  • Identify root cause by reading the inspection SQL (count and group by)
  • Suppress JOIN fan-out with pre-aggregation or DISTINCT
  • Pick the latest record with row_number() over(partition by ... order by updated_at desc)
  • For incremental models, combine unique_key with upsert semantics to prevent duplicates
SymptomObservation pointRemediation
Row count multiplies after JOINOne-to-many on the join keyPre-aggregate, DISTINCT, or pick a representative row
The same record re-appears on ingestionSame key with different update timestampsAdopt only the latest record via row_number
Frequent NULL values2 or more NULL rowsAdd not_null and fix the upstream

Deduplication flow (example)

rawstagingdedupedPick a representative row with row_numbermarts (unique OK)

Deduplication SQL example (keep the most recent row)

with src as (
  select * from {{ ref('stg_orders') }}
), ranked as (
  select
    *,
    row_number() over(
      partition by order_id
      order by updated_at desc
    ) as rn
  from src
)
select
  -- Required columns
  order_id, customer_id, amount, updated_at
from ranked
where rn = 1

CI/CD and Treating Tests as Quality Gates

It is common to operate unique as a gate that controls whether changes are promoted to production. Use severity: error for columns where you want the pipeline to halt on failure, and severity: warn for situations where failures should be tolerated (such as during a migration). The default severity is error.

Inspecting only the change diff lets you keep CI runtime down while maintaining quality. Rich test names and descriptions also speed up decision-making when failures occur.

  • Use severity: error for strict key columns
  • Use severity: warn for migrations or staged rollouts
  • Speed up runs via state selection
  • On PRs, run dbt test after dbt run and treat it as a gate
severityPipeline behaviorMain use cases
error (default)Halts on failurePrimary keys, foreign keys, billing-impacting aggregates
warnLogs a warning and continuesInterim quality during migrations and staged remediation

Conceptual run order in CI

checkoutdepsbuilddbt runtestsdbt testgatestop on unique failure

Setting severity per column in YAML

version: 2
models:
  - name: fct_orders
    columns:
      - name: order_id
        tests:
          - unique:
              config:
                severity: error
          - not_null
      - name: legacy_order_code
        description: May contain duplicates during the migration window
        tests:
          - unique:
              config:
                severity: warn

Per-Warehouse Caveats (NULL, Case Sensitivity, Collation)

dbt's unique relies on GROUP BY/HAVING in the generated SQL. The behavior of failing on 2 or more NULLs and passing with only 1 NULL is consistent across the major DWHs. However, what counts as a duplicate can vary depending on string case sensitivity and collation.

When you need case-insensitive uniqueness, apply unique to a derived column normalized with lower or upper, or implement a custom generic test.

  • 2 or more NULLs cause unique to fail (consistent across major DWHs)
  • Case sensitivity and collation depend on the DWH configuration
  • Inspect a normalized column when needed
  • For composite uniqueness, use custom tests or packages
DWHDefault case sensitivityExample workaround
SnowflakeBINARY (case-sensitive in general)Inspect a lower() column
BigQueryCase-sensitive (default)Normalized column or explicit collation (function-dependent)
Databricks SQLFollows Spark SQL (case-sensitive by default)Inspect a normalized column

Flow for inspecting via a normalized column

raw_valuenormalized_valuelower()OK/NGunique test

Custom generic test (case-insensitive uniqueness)

{% macro test_unique_lower(model, column_name) %}
with base as (
  select lower({{ column_name }}) as val from {{ model }}
), agg as (
  select val, count(*) as n from base group by 1 having count(*) > 1
)
select * from agg
{% endmacro %}

# Invocation (YAML)
# tests:
#   - unique_lower:
#       column_name: user_email

Exam Prep Checkpoints (Analytics Engineer)

Common exam topics include the meaning of the unique test, its relationship with NULL, the unique + not_null combination, and selector-based test execution. Also remember to distinguish between unique_key for incremental models (a runtime merge key) and the unique test (a quality check).

Choosing between relationships tests and accepted_values tests, and the operational call between warn and error, are also frequently tested.

  • Primary-key equivalent = unique + not_null
  • 1 NULL is OK, 2 or more NULLs is not (a property of unique)
  • Understand what dbt test -s state:modified+ means
  • unique_key is for incremental upserts; it is not a test
TopicImportancePhrase to remember
Relationship between unique and not_nullHighPrimary key = no duplicates and no NULLs
Selector (state:modified+)HighTest the diff and the downstream
Choosing the right severityMediumGate with error, observe with warn

A mental map of the topics

Schema Testsuniquenot_nullrelationshipsaccepted_values

Cheat sheet (YAML excerpt)

version: 2
models:
  - name: some_model
    columns:
      - name: business_key
        tests:
          - unique
          - not_null
          - relationships:
              to: ref('dim_ref')
              field: business_key

Check Your Understanding

Analytics Engineer

問題 1

Which statement about the behavior of the dbt generic test unique is correct, including the recommendation for primary-key-level strictness?

  1. It fails on 2 or more identical values and also fails on 2 or more NULLs. For a primary-key equivalent, combine it with not_null.
  2. It fails on 2 or more identical values, but NULLs are ignored regardless of how many there are. unique alone is enough for a primary-key equivalent.
  3. It only fails when the same value appears 3 or more times, and NULL is never allowed.
  4. unique is syntactic sugar for relationships and only validates referential integrity.

正解: A

unique runs GROUP BY/HAVING on the target column and fails on 2 or more identical values (NULL is aggregated as a single value, so 2 or more NULLs counts). For primary-key-level strictness, you must combine unique with not_null.

Frequently Asked Questions

Which should I add first: unique or not_null?

Order does not matter, but you should always define both on primary-key-like columns. unique alone does not fail when a single NULL exists, so pair it with not_null to forbid NULL values as well.

How do I test composite uniqueness (a combination of multiple columns)?

The built-in unique test targets a single column. For composite uniqueness, implement a custom generic test or use a community package such as dbt_utils.unique_combination_of_columns.

What is the difference between an incremental model's unique_key config and the unique test?

unique_key is the merge key (upsert key) used at runtime during incremental builds to prevent duplicates from being created. The unique test is a quality check that validates the uniqueness of the produced table. In practice, you typically use both together.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.