The dbt relationships test verifies that every value in a foreign-key column on the fact side also exists in the primary-key column on the dimension side. It is a key mechanism for preserving logical referential integrity even in designs that do not enforce physical foreign key constraints in the warehouse.
This article walks through how to define the test, when to apply it in practice, performance and operational tips, and the points most often tested on the Analytics Engineer exam, all grounded in the official documentation.
relationships verifies that values in a child table's column exist in the parent table's column. NULLs on the child side are ignored by default, and only cases where the child has a value that is missing from the parent are flagged as violations.
dbt generates a test query that returns only the diff, so tests are read-only and do not create any physical constraints in the warehouse.
| Test type | What it checks | Typical failure example |
|---|---|---|
| relationships | Referential integrity | fact.customer_id=999 does not exist in dim |
| not_null | Missing values | fact.customer_id is NULL |
| unique | Uniqueness | dim.customer_id has duplicates |
Basic relationship at a glance
Minimal schema.yml example
version: 2
models:
- name: fact_orders
columns:
- name: customer_id
tests:
- relationships:
to: ref('dim_customers')
field: id
When you want to scope tests by date range or status, the where setting on generic tests is the right tool. Switching the predicate per environment lets you apply production-grade integrity checks to small samples in dev as well.
Composite-key referential integrity is not directly supported by relationships alone. Practical approaches are to build a concatenated key on both parent and child, or to write a single singular SQL test using NOT EXISTS.
| Setting | Meaning | Watch out for |
|---|---|---|
| to | Specifies the referenced model | Normally use ref('parent_model') |
| field | Column name on the referenced model | Must match the key column on the parent |
| where | Filter condition on the child side | Shrink heavy tables with a date-range filter |
Flow for narrowing the child set with where
relationships combined with where
version: 2
models:
- name: fact_orders
columns:
- name: customer_id
tests:
- relationships:
to: ref('dim_customers')
field: id
where: "order_date >= dateadd(day, -90, current_date) AND order_status <> 'CANCELLED'"
Internally, relationships compiles to a query equivalent to NOT EXISTS or LEFT ANTI JOIN. To speed it up on large data, focus on the physical design of the join key columns and on filtering.
Combine warehouse-specific optimizations such as clustering keys on Snowflake and Z-Ordering on Databricks to cut test runtime.
| Platform | Recommended tuning | Notes |
|---|---|---|
| Snowflake | Clustering keys on parent.id / child.customer_id | Mind the micro-partition distribution |
| Databricks | Z-ORDER on the join key columns | Improves data-skipping efficiency on Delta tables |
| Common | Date-range filter + reference only the necessary columns | Minimize selected columns to reduce I/O |
Conceptual logical plan during the test (abstracted)
Sample SQL for validating composite keys with a singular test
-- tests/relationships_orders_customers.sql
-- Child: fact_orders(order_id, customer_id)
-- Parent: dim_customers(customer_id, tenant_id)
select f.order_id, f.customer_id, f.tenant_id
from {{ ref('fact_orders') }} f
where f.customer_id is not null
and not exists (
select 1
from {{ ref('dim_customers') }} d
where d.customer_id = f.customer_id
and d.tenant_id = f.tenant_id
)
store_failures is handy for investigating failing rows. It persists violating records to a dedicated schema when tests run, making reproduction and sharing easy.
Severity is controlled with the severity setting. A realistic pattern is warn in development and error as a gate in production pipelines.
| Item | Example | Behavior |
|---|---|---|
| severity | warn / error | warn continues with a warning; error halts on failure |
| store_failures | true | Saves violating records to a table |
| Selective run | dbt test -s test_type:generic | Runs only generic tests |
Operational flow for failing records
Example of switching severity and failure storage per environment
# Example tests defaults in dbt_project.yml
tests:
+severity: "{{ target.name == 'prod' and 'error' or 'warn' }}"
+store_failures: "{{ target.name == 'prod' }}"
On the Analytics Engineer exam, common topics include the purpose of relationships, NULL handling, how it splits responsibilities with unique and not_null, and when to choose a singular test instead.
In practice, a stable order is to guarantee parent key health first with unique + not_null, then run relationships on top of that.
| Angle | Frequently tested point | Summary |
|---|---|---|
| Purpose | What relationships guarantees | Child values also exist in the parent |
| NULL | Are child-side NULLs failures? | Ignored by default (not failed) |
| Alternatives | What to do with composite keys | Singular tests or a concatenated surrogate key |
How model layers and tests are laid out
Example of collapsing relationships into a single column via a surrogate key
-- Build a surrogate key with identical logic on both parent and child
-- parent_surrogate_key = md5(coalesce(customer_id,'') || '|' || coalesce(tenant_id,''))
-- schema.yml excerpt
models:
- name: fact_orders
columns:
- name: customer_sk
tests:
- relationships:
to: ref('dim_customers')
field: customer_sk
Analytics Engineer
問題 1
You want to use a dbt generic test to confirm that every value in fact_orders.customer_id exists in dim_customers.id. Which test is the most appropriate?
正解: C
relationships verifies that values in the child column exist in the specified column of the referenced model. not_null checks for missing values, unique checks uniqueness, and accepted_values checks an allowed-value set — none of those directly guarantee referential integrity.
Does the relationships test create a foreign key constraint in the warehouse?
No. dbt tests run a query that surfaces violating rows; they do not create any physical constraints in the warehouse.
Does relationships fail when the child column is NULL?
By default, NULLs on the child side are not treated as violations. If you also need to forbid NULLs, combine it with not_null.
How should I validate referential integrity for composite keys?
relationships assumes a single column. For composite keys, either build a surrogate key with identical logic on both parent and child and apply relationships to it, or write a singular test using NOT EXISTS.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...