dbt

Reliably Validate Categorical Values with the dbt accepted_values Test

2026-04-19
NicheeLab Editorial Team

Invalid values in categorical columns cause broken dashboard metrics and downstream ETL exceptions. The dbt accepted_values test is a generic test that declares the set of allowed values up front and flags anything else.

This article walks through stable usage based on the official docs, common pitfalls, how to choose among alternatives, and CI integration end-to-end. It also covers the comparison points often asked about on the exam.

accepted_values Basics and Use Cases

The accepted_values test verifies that a column's values are members of a specified list. Typical examples are categorical columns with a finite set of values such as order status, customer tier, country code, or language code. Because deviations are caught before the meaning of the column shifts, it is effective for detecting schema drift early.

It works simply by declaring the allowed set in a schema YAML and is lightweight and fast. Even on DWHs where database constraints are not enforced in practice (e.g., non-enforced foreign keys), you can guarantee the invariant on the application side as a dbt test.

  • Targets: dimension-like columns with a finite set of values (status, plan, channel, etc.)
  • Effects: early detection of out-of-range values and gaps in staging normalization
  • Operations: changes go through PR updates to the list, so diffs are visible in review

Data preparation and where accepted_values fits in

sourcestagingmarts/modeldbt tests (CI)accepted_valuesDetect deviation and fail

Minimal accepted_values test example (schema.yml)

version: 2
models:
  - name: dim_customer
    description: Customer dimension
    columns:
      - name: status
        description: Current customer status
        tests:
          - accepted_values:
              values: ['active', 'inactive', 'prospect']
              quote: true

Definition Patterns and Options in schema.yml

When accepted_values is placed under a column, values outside the list are flagged as failures for that column. For numeric keys, set quote to false and treat them as SQL literals. To ignore missing values, narrow the rows with where.

Test severity can be adjusted in config. Setting severity to warn makes the test issue a warning instead of failing. Severity and some fine-grained threshold settings differ between dbt versions. Check the official dbt docs as needed.

  • values: array of allowed values (required)
  • quote: whether to wrap values in quotes. Use false for numbers and true for strings as a safe default
  • where: condition for target rows (e.g., exclude deleted rows or NULL)
  • config.severity: warn or error (typical usage)

Example with integer IDs, NULL exclusion, and warning-level operation

version: 2
models:
  - name: fct_orders
    columns:
      - name: status_id
        tests:
          - accepted_values:
              values: [1, 2, 3, 4]
              quote: false
              where: "status_id is not null"
              config:
                severity: warn

Common Pitfalls and How to Avoid Them

A common failure is when upstream lacks proper cleanup (case, whitespace, aliases) so synonyms are treated as different values. Applying accepted_values to a column normalized in staging (trim, lower) keeps it stable.

Whether to allow NULL is a design decision. If NULL is acceptable, exclude it with where, or combine with a separate not_null test to make the policy explicit.

  • String normalization: apply lower/trim in staging before testing
  • NULL handling: exclude via where if allowed; combine with not_null if forbidden
  • Adding values: add to values via PR so the intent of the change is reviewable
  • Large sets: for long lists, use a seed or dimension table and switch to relationships

Combining staging normalization with the test

-- models/stg_orders.sql
with src as (
  select
    order_id,
    lower(trim(status)) as status_norm,
    *
  from {{ ref('raw_orders') }}
)
select * from src;

# models/stg_orders.yml
version: 2
models:
  - name: stg_orders
    columns:
      - name: status_norm
        tests:
          - accepted_values:
              values: ['pending','shipped','canceled']
              quote: true

Choosing Between accepted_values and Alternatives

For a static finite set, accepted_values is the shortest path. When the set is managed by a business master, however, enforcing consistency against the master table with the relationships test is more maintainable. Native DWH constraints (CHECK / FOREIGN KEY) may be non-enforced or ignored by the optimizer depending on the product, so backing them up with dbt tests is realistic.

On the exam, you are often asked about when to use accepted_values vs relationships, and how changes are operated (updating the list via PR vs updating the master table).

  • Static, small category sets: accepted_values
  • Business master driven, high change frequency: referential integrity via relationships
  • DWH constraints: evaluate enforcement in your environment and maintenance cost
MethodWhen to useStrengths and caveats
accepted_valuesStatic finite sets (status, tier, channel)Complete in YAML and fast. Changes are visible via PR. Management cost grows as the set gets larger.
relationships testWhen a master table is the source of truthManage the set centrally as table data. New values are added by inserting into the master. Missing references are detected immediately.
DB CHECK / foreign key constraintsWhen the product enforces it and you can enable it in operationsPermanently guaranteed by the engine. However, many DWHs do not enforce it or have high cost, so it is often disabled in practice.

Example of the relationships test reconciling with a master table

version: 2
models:
  - name: fct_orders
    columns:
      - name: status
        tests:
          - relationships:
              to: {{ ref('dim_status') }}
              field: status
              to_field: status_code

Operations: Storing Failures, Severity, and CI Integration

To inspect failing rows later, enable store_failures. A failing-rows table is created in a dedicated schema, making root cause analysis easier. Tune severity per environment — warn in dev, error in prod — as a practical operating model.

In CI, narrowing the test target to only models related to the change is fast. Combine tags and state selectors to enable diff-based testing on PRs.

  • store_failures: persist failing rows with true
  • Switch severity per environment (warn in dev / error in prod)
  • Selectors: -s for models and tags, state:modified for diff-based testing

Defaults in dbt_project.yml and CLI examples

# dbt_project.yml (excerpt)
tests:
  +store_failures: true
  +schema: dbt_test__audit
  +severity: error

# CLI example (for CI)
# Test only changed models and their descendants
$ dbt test -m state:modified+
# Only high-priority tests
$ dbt test -s tag:critical

Exam Focus Points and Mini Exercise

On the Analytics Engineer exam, the appropriate use of accepted_values vs relationships, when to apply where and quote, and how to handle NULL are common themes. The key is not to miss hints in the question text (whether a master table exists, whether the column is an integer, whether NULL is allowed).

Be careful about the YAML location. When written under a column, column_name is unnecessary. The baseline is quote: false for numeric lists and quote: true for strings.

  • Master exists → relationships; no master → accepted_values
  • NULL allowed → exclude with where; not allowed → combine with not_null
  • Integer IDs → quote: false; strings → quote: true

Reference answer for the mini exercise

version: 2
models:
  - name: dim_subscription
    columns:
      - name: plan
        tests:
          - accepted_values:
              values: ['free','standard','pro']
              quote: true
              where: "plan is not null"

Check with a Question

Analytics Engineer

問題 1

For the status column of the orders table, you want to allow only 'pending', 'shipped', and 'canceled'. New business statuses may be added in the future, but you first want to detect deviations early. Which is the most appropriate action?

  1. Specify ['pending','shipped','canceled'] in the values of accepted_values in schema.yml and update the list via PR when changes occur
  2. Remove the accepted_values test and filter invalid values on the dashboard side
  3. Set quote to false and evaluate status as a numeric literal
  4. Replace it with the dbt relationships test but do not prepare a referenced master table

正解: A

accepted_values is appropriate for quickly enforcing a static finite set. Future changes are made reviewable by updating values via PR. Replacing with relationships without a master table is inappropriate, and relying on dashboard-side filtering does not enforce quality. The baseline is quote: true for strings.

Frequently Asked Questions

How should I write accepted_values when I want to allow NULL?

Use the where clause on accepted_values to filter target rows and exclude NULL. Example: where: "status is not null". Conversely, if you want to forbid NULL, combine with the not_null test to make the intent explicit.

There are many accepted values and writing them directly in YAML is hard to manage. What should I do?

Prepare a seed or dimension table as the business master and use the relationships test to enforce referential integrity. If the set changes frequently, relationships is a better fit than accepted_values.

Where do I configure severity (warn/error) and persistence of failing rows?

For individual tests, set severity and store_failures inside the tests config. Project-wide defaults are specified in the tests section of dbt_project.yml. Details can vary by version, so verify against the official docs for your dbt version.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.