Invalid values in categorical columns cause broken dashboard metrics and downstream ETL exceptions. The dbt accepted_values test is a generic test that declares the set of allowed values up front and flags anything else.
This article walks through stable usage based on the official docs, common pitfalls, how to choose among alternatives, and CI integration end-to-end. It also covers the comparison points often asked about on the exam.
The accepted_values test verifies that a column's values are members of a specified list. Typical examples are categorical columns with a finite set of values such as order status, customer tier, country code, or language code. Because deviations are caught before the meaning of the column shifts, it is effective for detecting schema drift early.
It works simply by declaring the allowed set in a schema YAML and is lightweight and fast. Even on DWHs where database constraints are not enforced in practice (e.g., non-enforced foreign keys), you can guarantee the invariant on the application side as a dbt test.
Data preparation and where accepted_values fits in
Minimal accepted_values test example (schema.yml)
version: 2
models:
- name: dim_customer
description: Customer dimension
columns:
- name: status
description: Current customer status
tests:
- accepted_values:
values: ['active', 'inactive', 'prospect']
quote: true
When accepted_values is placed under a column, values outside the list are flagged as failures for that column. For numeric keys, set quote to false and treat them as SQL literals. To ignore missing values, narrow the rows with where.
Test severity can be adjusted in config. Setting severity to warn makes the test issue a warning instead of failing. Severity and some fine-grained threshold settings differ between dbt versions. Check the official dbt docs as needed.
Example with integer IDs, NULL exclusion, and warning-level operation
version: 2
models:
- name: fct_orders
columns:
- name: status_id
tests:
- accepted_values:
values: [1, 2, 3, 4]
quote: false
where: "status_id is not null"
config:
severity: warn
A common failure is when upstream lacks proper cleanup (case, whitespace, aliases) so synonyms are treated as different values. Applying accepted_values to a column normalized in staging (trim, lower) keeps it stable.
Whether to allow NULL is a design decision. If NULL is acceptable, exclude it with where, or combine with a separate not_null test to make the policy explicit.
Combining staging normalization with the test
-- models/stg_orders.sql
with src as (
select
order_id,
lower(trim(status)) as status_norm,
*
from {{ ref('raw_orders') }}
)
select * from src;
# models/stg_orders.yml
version: 2
models:
- name: stg_orders
columns:
- name: status_norm
tests:
- accepted_values:
values: ['pending','shipped','canceled']
quote: true
For a static finite set, accepted_values is the shortest path. When the set is managed by a business master, however, enforcing consistency against the master table with the relationships test is more maintainable. Native DWH constraints (CHECK / FOREIGN KEY) may be non-enforced or ignored by the optimizer depending on the product, so backing them up with dbt tests is realistic.
On the exam, you are often asked about when to use accepted_values vs relationships, and how changes are operated (updating the list via PR vs updating the master table).
| Method | When to use | Strengths and caveats |
|---|---|---|
| accepted_values | Static finite sets (status, tier, channel) | Complete in YAML and fast. Changes are visible via PR. Management cost grows as the set gets larger. |
| relationships test | When a master table is the source of truth | Manage the set centrally as table data. New values are added by inserting into the master. Missing references are detected immediately. |
| DB CHECK / foreign key constraints | When the product enforces it and you can enable it in operations | Permanently guaranteed by the engine. However, many DWHs do not enforce it or have high cost, so it is often disabled in practice. |
Example of the relationships test reconciling with a master table
version: 2
models:
- name: fct_orders
columns:
- name: status
tests:
- relationships:
to: {{ ref('dim_status') }}
field: status
to_field: status_code
To inspect failing rows later, enable store_failures. A failing-rows table is created in a dedicated schema, making root cause analysis easier. Tune severity per environment — warn in dev, error in prod — as a practical operating model.
In CI, narrowing the test target to only models related to the change is fast. Combine tags and state selectors to enable diff-based testing on PRs.
Defaults in dbt_project.yml and CLI examples
# dbt_project.yml (excerpt)
tests:
+store_failures: true
+schema: dbt_test__audit
+severity: error
# CLI example (for CI)
# Test only changed models and their descendants
$ dbt test -m state:modified+
# Only high-priority tests
$ dbt test -s tag:critical
On the Analytics Engineer exam, the appropriate use of accepted_values vs relationships, when to apply where and quote, and how to handle NULL are common themes. The key is not to miss hints in the question text (whether a master table exists, whether the column is an integer, whether NULL is allowed).
Be careful about the YAML location. When written under a column, column_name is unnecessary. The baseline is quote: false for numeric lists and quote: true for strings.
Reference answer for the mini exercise
version: 2
models:
- name: dim_subscription
columns:
- name: plan
tests:
- accepted_values:
values: ['free','standard','pro']
quote: true
where: "plan is not null"
Analytics Engineer
問題 1
For the status column of the orders table, you want to allow only 'pending', 'shipped', and 'canceled'. New business statuses may be added in the future, but you first want to detect deviations early. Which is the most appropriate action?
正解: A
accepted_values is appropriate for quickly enforcing a static finite set. Future changes are made reviewable by updating values via PR. Replacing with relationships without a master table is inappropriate, and relying on dashboard-side filtering does not enforce quality. The baseline is quote: true for strings.
How should I write accepted_values when I want to allow NULL?
Use the where clause on accepted_values to filter target rows and exclude NULL. Example: where: "status is not null". Conversely, if you want to forbid NULL, combine with the not_null test to make the intent explicit.
There are many accepted values and writing them directly in YAML is hard to manage. What should I do?
Prepare a seed or dimension table as the business master and use the relationships test to enforce referential integrity. If the set changes frequently, relationships is a better fit than accepted_values.
Where do I configure severity (warn/error) and persistence of failing rows?
For individual tests, set severity and store_failures inside the tests config. Project-wide defaults are specified in the tests section of dbt_project.yml. Details can vary by version, so verify against the official docs for your dbt version.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...