not_null is dbt's built-in generic test that guarantees a column contains zero NULL values.
Applying it at both the source and model layers to catch missing values early is important both in practice and on the exam.
not_null is a generic test that verifies a target column has zero NULLs. dbt runs every test as a SELECT that returns failing rows: 0 rows means pass, 1+ rows means fail. It is a stable feature of the official documentation and applies to models, sources, and seeds.
On the Analytics Engineer exam, expect questions on where to place not_null (both sources and staging/intermediate/fact layers), test configuration (severity, store_failures, where), and how to choose between related tests (unique, relationships, accepted_values).
Where not_null sits in the data quality chain
The minimal setup is to list not_null under a column's tests section. Use severity=error for high-importance columns and warn while you are still in observation mode, rolling CI out gradually. The where clause can exclude logically deleted rows. Enabling store_failures persists failing rows to an audit table, making root-cause analysis easier (the storage schema and naming depend on the adapter).
Define not_null at the source layer too so that upstream NULLs are caught early. From staging onward, the standard practice is to lock down primary keys, foreign keys, and business keys with not_null.
YAML example (applying not_null on models and sources)
version: 2
models:
- name: fct_orders
description: Final fact table for orders
columns:
- name: order_id
description: Primary key
tests:
- unique
- not_null: # critical columns use error
severity: error
store_failures: true
- name: customer_id
description: Foreign key to dim_customers
tests:
- not_null:
severity: error
where: "is_deleted = false" # exclude logically deleted rows
- relationships:
to: ref('dim_customers')
field: customer_id
sources:
- name: app
schema: raw_app
tables:
- name: orders_raw
columns:
- name: order_id
tests:
- not_null # catch upstream NULLs early
- name: created_at
tests:
- not_null: { severity: warn } # monitor while rolling out gradually
dbt test can be scoped to specific nodes. Specifying a model name and running only the tests attached to it is the most reliable workflow. In CI, the easiest policy is to block on critical columns with severity=error and just log warn-level issues during observation.
store_failures keeps failing rows for investigation. To avoid long-term accumulation, use a scheduled cleanup policy or a dedicated schema.
Empty strings are not NULLs: in most warehouses (Snowflake, BigQuery, Redshift, etc.) '' and NULL are different. not_null does not catch empty strings, so when upstream may send them, normalize first in staging with NULLIF(TRIM(col), '') and only then apply not_null.
JOINs can introduce NULLs: a LEFT JOIN produces NULL on the right side for unmatched rows. When a column that must satisfy not_null ends up NULL after a join, reconsider the join strategy (switch to INNER JOIN, pre-filter, or apply COALESCE).
Casts and derived columns: failed CASTs or conditional branches can quietly produce NULLs. Either default the value with COALESCE immediately after derivation, or reinforce quality by pairing with accepted_values to constrain the expected range.
not_null at the source layer is for early detection, while not_null in staging and core layers enforces the data contract. In particular, monitor primary keys, foreign keys, and business keys continuously by combining not_null with unique and relationships.
In CI/CD, make a not_null failure on a critical column a deploy-blocking condition. Persist failing rows to an audit schema via store_failures, and document the SLA from detection to recovery.
not_null guarantees existence, but full integrity comes from combining it with other tests. Knowing how it compares to the main alternatives makes the exam's multiple-choice questions much easier.
| Test | Purpose | Typical failure example | Operational tip |
|---|---|---|---|
| not_null | Eliminate NULLs (guarantee existence) | Join misses turn rows into NULL; empty strings not normalized to NULL | Normalize NULLs in staging, then apply not_null |
| unique | Eliminate duplicates (uniqueness) | The same ID appears on multiple rows | Pair with not_null to guarantee a primary key |
| relationships | Referential integrity (foreign key) | An ID that does not exist in the parent table | Apply not_null on both parent and child to detect orphans |
| accepted_values | Adherence to a code taxonomy | Unexpected category values | If you want to treat NULL separately, mind the order in which tests are combined |
Analytics Engineer
問題 1
Which configuration best satisfies all of the following requirements at once? Requirements: 1) fct_orders.order_id is required, so a CI failure must block the deploy; 2) rows where is_deleted=true must be excluded from validation; 3) failing rows must be persisted for investigation.
正解: A
Blocking in CI requires severity=error, the exclusion condition is is_deleted=false, and persisting failing rows requires store_failures=true. not_null is the best fit for these requirements.
Does not_null detect empty strings ('')?
In most data warehouses, '' is not NULL. not_null only detects NULLs. To guard against empty strings, normalize them in staging with NULLIF(TRIM(col), '').
not_null runs slowly on large tables. Can I speed it up?
Practical options include narrowing scope with where, filtering on partition columns, and running only critical columns first (selected via tags). Run full-table validation on a periodic batch schedule for safety.
Should I define not_null on sources or on models?
Both. Use it on sources to catch upstream NULLs early, and re-verify required columns (primary keys, foreign keys, business keys) on models to enforce the contract twice.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...