Generic Tests: Schema-Driven Data Quality (2026)

dbt's generic tests are the first step toward declaratively guaranteeing data quality from YAML. This article focuses on the four most common tests — unique / not_null / accepted_values / relationships — and explains their behavior and when to use each one from a practical standpoint.

From an exam perspective, expect questions on each test's parameters, NULL handling, referential integrity definitions, where filters, and severity settings. We stick to the stable, documented behavior so the patterns are reproducible.

Overview of Generic Tests and the Basic YAML Form

dbt's generic tests are validation queries generated by SQL macros against models (tables/views) or their columns. You declare the tests in schema.yml (any YAML file works) and run dbt test — if the query returns failing rows the test fails, and if it returns none it passes.

The four canonical tests are not_null (no missing values), unique (uniqueness), accepted_values (whitelist of allowed values), and relationships (referential integrity). You can scope target rows with where and control failure severity (error/warn) via severity.

Tests run against an actual relation (table/view) — ephemeral models cannot be tested directly
Most tests support a where parameter for row-level filtering
Use dbt test -s to select tests; -m selects models (newer versions favor -s)
severity: error (default) / warn. With warn, the job continues even on failure
Use store_failures to persist failed rows when needed (subject to adapter/version constraints)

Test	Primary Use Case	What a Failure Indicates	Practical Notes
not_null	Forbid missing values on required keys/attributes	A NULL exists	Design what NULL means. For derived columns, consider whether aggregation logic should compensate
unique	Uniqueness of natural keys / surrogate keys	Duplicate values, or includes NULLs	dbt's unique counts NULLs as failures. Pair with explicit not_null to make root-cause analysis easier
accepted_values	Whitelist of status codes / category values	A value outside the list, or NULL (when not allowed)	Use quote: false for numerics and booleans. To allow NULL, include null in values
relationships	Foreign-key integrity (fact -> dimension)	A referenced key does not exist in the target	Watch join key types/trimming/case. For large fact tables, scope with where to manage cost

Conceptual flow of a test definition

Minimal schema.yml example (all four tests)

version: 2
models:
  - name: dim_customers
    columns:
      - name: customer_id
        tests:
          - not_null
          - unique
      - name: status
        tests:
          - accepted_values:
              values: ['active', 'inactive']
              quote: true

  - name: fct_orders
    columns:
      - name: customer_id
        tests:
          - relationships:
              to: ref('dim_customers')
              field: customer_id
          - not_null

not_null Behavior and Design Considerations

not_null verifies that the target column contains no NULLs. The basic rule is to apply it to primary keys, foreign keys, and operationally required attributes.

NULLs in derived columns (e.g., normalization flags or aggregation outputs) sometimes mean "undetermined", and blindly applying not_null can break operations. Sometimes it's safer to allow NULL and have the dashboard layer fill defaults.

Execution cost on large tables stays low (a simple is null scan)
Using where to limit to active periods or is_active = true helps avoid false positives
Even on DWHs without NOT NULL constraints on the physical table, you can soft-validate from dbt
For required keys, pairing not_null with unique is the standard for both exam and production work

Example: applying where to not_null

version: 2
models:
  - name: dim_products
    columns:
      - name: product_sku
        tests:
          - not_null:
              where: "is_active = true"

unique Also Catches NULLs — Watch Out

dbt's unique treats both duplicate values and NULLs as failures (under the hood it combines a NULL-row extraction with a duplicate extraction). Technically, unique on its own already subsumes not_null.

That said, to make failure causes easier to pinpoint, the best practice is to declare both not_null and unique on primary key columns. For composite uniqueness, generate a surrogate key or consider a custom test (or a macro from a proven package).

Primary keys: the two-step not_null + unique pattern makes diagnosis easier
For composite uniqueness, generating a hash key (e.g., sha1(concat_ws)) is conservative and stable
When duplicates abound, store_failures helps preserve offending rows for analysis
Scope with where (e.g., keys only within an active period)

Pairing unique with not_null (primary-key scenario)

version: 2
models:
  - name: dim_dates
    columns:
      - name: date_key
        tests:
          - not_null
          - unique

accepted_values Tips (Allowing NULL and Quoting)

accepted_values verifies that a column's values belong to a predefined set. It's useful for business codes, statuses, and flag columns. The clearest convention is quote: false for numerics and booleans, quote: true for strings.

To allow NULL, include null in the YAML values array (unquoted null). Case-handling depends on the database's collation, so for stability, normalize the column on the model side before testing if needed.

values is a whitelist. Any missing entry causes immediate failure
Specify quote: false for numerics and booleans
To allow NULL, write values: [..., null] (null without quotes)
You can add conditions via where, e.g., excluding deprecated codes

accepted_values examples (NULL allowed, numeric type)

version: 2
models:
  - name: dim_status
    columns:
      - name: status
        tests:
          - accepted_values:
              values: ['active', 'inactive', null]
              quote: true
      - name: importance_level
        tests:
          - accepted_values:
              values: [1, 2, 3]
              quote: false
              where: "is_deprecated = false"

Enforce Referential Integrity with relationships

relationships verifies that the fact table's foreign key exists as the dimension table's primary key. Use to to specify the referenced model and field to specify the referenced column. When defined inside a columns section, column_name implicitly refers to the current column.

On huge fact tables this test can get expensive, so it's pragmatic to scope it with where (e.g., the last N periods). Key-type mismatches, leading/trailing whitespace, and case differences are classic failure causes.

Combine the foreign-key column with not_null to reduce missed gaps
Also test the referenced side's uniqueness (unique) separately
Standardize types, trimming, and normalization at the modeling layer
Staged validation via where works well around bulk loads

Reference relationship between fact and dimension tables

relationships YAML example (inside a column definition)

version: 2
models:
  - name: fct_orders
    columns:
      - name: customer_id
        tests:
          - not_null
          - relationships:
              to: ref('dim_customers')
              field: customer_id
              where: "order_date >= dateadd('day', -30, current_date)"

Operational Best Practices and Key Exam Points

In production, the small-start approach (begin with key primary keys and major dimensions) stabilizes things, and codifying a failure-response runbook (contacts, rollback options, tolerance levels) helps even more. In CI, run dbt test per PR and gradually promote severity from warn to error.

On the exam, expect questions on each test's required parameters, NULL handling, the meaning of where / severity / quote, and the reference specification for relationships (to and field). The fact that unique also detects NULL as a failure is a frequent target.

Targeted runs via dbt test -s test_type:unique or path-based selectors
severity: warn makes failures visible without breaking the job (good for staged rollout)
store_failures: true persists failure rows in a separate schema for easier investigation (adapter-dependent)
Error messages differ per test type, so splitting definitions makes root-cause identification easier

Selective execution and severity tuning (sample commands)

# Run only tests under models changed recently
# (Adjust selectors based on your version and adapter)
dbt test -s state:modified+ type:generic

# Run only unique tests
dbt test -s test_type:unique

# Demote selected tests to warnings (in YAML)
version: 2
models:
  - name: fct_orders
    tests:
      - dbt_utils.expression_is_true:
          expression: "total_amount >= 0"
          config:
            severity: warn

Check Your Understanding

Analytics Engineer

問題 1

You want fct_orders.status to allow 'pending' or 'completed' as well as NULL. Which accepted_values definition is most appropriate?

tests: [accepted_values: {values: ['pending', 'completed', null], quote: true}]
tests: [accepted_values: {values: ['pending', 'completed', 'null'], quote: true}]
tests: [accepted_values: {values: [pending, completed], quote: false}]
tests: [accepted_values: {values: ['pending', 'completed'], allow_nulls: true}]

正解: A

To allow NULL in dbt's accepted_values, include YAML null (unquoted) in values. B uses the string 'null', which is not NULL. C leaves pending/completed unquoted, so they don't match as strings. D's allow_nulls is not a built-in parameter.

Frequently Asked Questions

Do I need both unique and not_null?

dbt's unique test treats NULLs as failures too, but for clearer error messages and easier root-cause analysis, both practical experience and the exam recommend pairing not_null with unique on primary key columns.

Can relationships validate composite keys (2+ columns)?

The built-in relationships test assumes a single column. For composite keys, the robust options are to generate a surrogate key inside the model and apply relationships to it, or to author a custom test.

How does accepted_values handle case differences?

Comparison behavior depends on the database's collation. For deterministic results, normalize on the model side using lower() or similar, and align accepted_values to lowercase as well.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

dbt Generic Tests Basics: Using unique / not_null / accepted_values / relationships Correctly