dbt

dbt Generic Tests Basics: Using unique / not_null / accepted_values / relationships Correctly

2026-04-19
NicheeLab Editorial Team

dbt's generic tests are the first step toward declaratively guaranteeing data quality from YAML. This article focuses on the four most common tests — unique / not_null / accepted_values / relationships — and explains their behavior and when to use each one from a practical standpoint.

From an exam perspective, expect questions on each test's parameters, NULL handling, referential integrity definitions, where filters, and severity settings. We stick to the stable, documented behavior so the patterns are reproducible.

Overview of Generic Tests and the Basic YAML Form

dbt's generic tests are validation queries generated by SQL macros against models (tables/views) or their columns. You declare the tests in schema.yml (any YAML file works) and run dbt test — if the query returns failing rows the test fails, and if it returns none it passes.

The four canonical tests are not_null (no missing values), unique (uniqueness), accepted_values (whitelist of allowed values), and relationships (referential integrity). You can scope target rows with where and control failure severity (error/warn) via severity.

  • Tests run against an actual relation (table/view) — ephemeral models cannot be tested directly
  • Most tests support a where parameter for row-level filtering
  • Use dbt test -s to select tests; -m selects models (newer versions favor -s)
  • severity: error (default) / warn. With warn, the job continues even on failure
  • Use store_failures to persist failed rows when needed (subject to adapter/version constraints)
TestPrimary Use CaseWhat a Failure IndicatesPractical Notes
not_nullForbid missing values on required keys/attributesA NULL existsDesign what NULL means. For derived columns, consider whether aggregation logic should compensate
uniqueUniqueness of natural keys / surrogate keysDuplicate values, or includes NULLsdbt's unique counts NULLs as failures. Pair with explicit not_null to make root-cause analysis easier
accepted_valuesWhitelist of status codes / category valuesA value outside the list, or NULL (when not allowed)Use quote: false for numerics and booleans. To allow NULL, include null in values
relationshipsForeign-key integrity (fact -> dimension)A referenced key does not exist in the targetWatch join key types/trimming/case. For large fact tables, scope with where to manage cost

Conceptual flow of a test definition

YAML (schema.yml)generic test macroSQL (fail rows)rendersdbt testsuccess (no rows) / failure (>=1 row)From YAML, a macro renders SQL; dbt test passes on 0 rows and fails on 1+ rows

Minimal schema.yml example (all four tests)

version: 2
models:
  - name: dim_customers
    columns:
      - name: customer_id
        tests:
          - not_null
          - unique
      - name: status
        tests:
          - accepted_values:
              values: ['active', 'inactive']
              quote: true

  - name: fct_orders
    columns:
      - name: customer_id
        tests:
          - relationships:
              to: ref('dim_customers')
              field: customer_id
          - not_null

not_null Behavior and Design Considerations

not_null verifies that the target column contains no NULLs. The basic rule is to apply it to primary keys, foreign keys, and operationally required attributes.

NULLs in derived columns (e.g., normalization flags or aggregation outputs) sometimes mean "undetermined", and blindly applying not_null can break operations. Sometimes it's safer to allow NULL and have the dashboard layer fill defaults.

  • Execution cost on large tables stays low (a simple is null scan)
  • Using where to limit to active periods or is_active = true helps avoid false positives
  • Even on DWHs without NOT NULL constraints on the physical table, you can soft-validate from dbt
  • For required keys, pairing not_null with unique is the standard for both exam and production work

Example: applying where to not_null

version: 2
models:
  - name: dim_products
    columns:
      - name: product_sku
        tests:
          - not_null:
              where: "is_active = true"

unique Also Catches NULLs — Watch Out

dbt's unique treats both duplicate values and NULLs as failures (under the hood it combines a NULL-row extraction with a duplicate extraction). Technically, unique on its own already subsumes not_null.

That said, to make failure causes easier to pinpoint, the best practice is to declare both not_null and unique on primary key columns. For composite uniqueness, generate a surrogate key or consider a custom test (or a macro from a proven package).

  • Primary keys: the two-step not_null + unique pattern makes diagnosis easier
  • For composite uniqueness, generating a hash key (e.g., sha1(concat_ws)) is conservative and stable
  • When duplicates abound, store_failures helps preserve offending rows for analysis
  • Scope with where (e.g., keys only within an active period)

Pairing unique with not_null (primary-key scenario)

version: 2
models:
  - name: dim_dates
    columns:
      - name: date_key
        tests:
          - not_null
          - unique

accepted_values Tips (Allowing NULL and Quoting)

accepted_values verifies that a column's values belong to a predefined set. It's useful for business codes, statuses, and flag columns. The clearest convention is quote: false for numerics and booleans, quote: true for strings.

To allow NULL, include null in the YAML values array (unquoted null). Case-handling depends on the database's collation, so for stability, normalize the column on the model side before testing if needed.

  • values is a whitelist. Any missing entry causes immediate failure
  • Specify quote: false for numerics and booleans
  • To allow NULL, write values: [..., null] (null without quotes)
  • You can add conditions via where, e.g., excluding deprecated codes

accepted_values examples (NULL allowed, numeric type)

version: 2
models:
  - name: dim_status
    columns:
      - name: status
        tests:
          - accepted_values:
              values: ['active', 'inactive', null]
              quote: true
      - name: importance_level
        tests:
          - accepted_values:
              values: [1, 2, 3]
              quote: false
              where: "is_deprecated = false"

Enforce Referential Integrity with relationships

relationships verifies that the fact table's foreign key exists as the dimension table's primary key. Use to to specify the referenced model and field to specify the referenced column. When defined inside a columns section, column_name implicitly refers to the current column.

On huge fact tables this test can get expensive, so it's pragmatic to scope it with where (e.g., the last N periods). Key-type mismatches, leading/trailing whitespace, and case differences are classic failure causes.

  • Combine the foreign-key column with not_null to reduce missed gaps
  • Also test the referenced side's uniqueness (unique) separately
  • Standardize types, trimming, and normalization at the modeling layer
  • Staged validation via where works well around bulk loads

Reference relationship between fact and dimension tables

relationships: to=dim_customers, field=customer_iddim_customerscustomer_id (PK)fct_orderscustomer_id (FK)relationships test confirms fct_orders.customer_id references dim_customers.customer_id

relationships YAML example (inside a column definition)

version: 2
models:
  - name: fct_orders
    columns:
      - name: customer_id
        tests:
          - not_null
          - relationships:
              to: ref('dim_customers')
              field: customer_id
              where: "order_date >= dateadd('day', -30, current_date)"

Operational Best Practices and Key Exam Points

In production, the small-start approach (begin with key primary keys and major dimensions) stabilizes things, and codifying a failure-response runbook (contacts, rollback options, tolerance levels) helps even more. In CI, run dbt test per PR and gradually promote severity from warn to error.

On the exam, expect questions on each test's required parameters, NULL handling, the meaning of where / severity / quote, and the reference specification for relationships (to and field). The fact that unique also detects NULL as a failure is a frequent target.

  • Targeted runs via dbt test -s test_type:unique or path-based selectors
  • severity: warn makes failures visible without breaking the job (good for staged rollout)
  • store_failures: true persists failure rows in a separate schema for easier investigation (adapter-dependent)
  • Error messages differ per test type, so splitting definitions makes root-cause identification easier

Selective execution and severity tuning (sample commands)

# Run only tests under models changed recently
# (Adjust selectors based on your version and adapter)
dbt test -s state:modified+ type:generic

# Run only unique tests
dbt test -s test_type:unique

# Demote selected tests to warnings (in YAML)
version: 2
models:
  - name: fct_orders
    tests:
      - dbt_utils.expression_is_true:
          expression: "total_amount >= 0"
          config:
            severity: warn

Check Your Understanding

Analytics Engineer

問題 1

You want fct_orders.status to allow 'pending' or 'completed' as well as NULL. Which accepted_values definition is most appropriate?

  1. tests: [accepted_values: {values: ['pending', 'completed', null], quote: true}]
  2. tests: [accepted_values: {values: ['pending', 'completed', 'null'], quote: true}]
  3. tests: [accepted_values: {values: [pending, completed], quote: false}]
  4. tests: [accepted_values: {values: ['pending', 'completed'], allow_nulls: true}]

正解: A

To allow NULL in dbt's accepted_values, include YAML null (unquoted) in values. B uses the string 'null', which is not NULL. C leaves pending/completed unquoted, so they don't match as strings. D's allow_nulls is not a built-in parameter.

Frequently Asked Questions

Do I need both unique and not_null?

dbt's unique test treats NULLs as failures too, but for clearer error messages and easier root-cause analysis, both practical experience and the exam recommend pairing not_null with unique on primary key columns.

Can relationships validate composite keys (2+ columns)?

The built-in relationships test assumes a single column. For composite keys, the robust options are to generate a surrogate key inside the model and apply relationships to it, or to author a custom test.

How does accepted_values handle case differences?

Comparison behavior depends on the database's collation. For deterministic results, normalize on the model side using lower() or similar, and align accepted_values to lowercase as well.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.