dbt relationships Test: Referential Integrity (2026)

The dbt relationships test verifies that every value in a foreign-key column on the fact side also exists in the primary-key column on the dimension side. It is a key mechanism for preserving logical referential integrity even in designs that do not enforce physical foreign key constraints in the warehouse.

This article walks through how to define the test, when to apply it in practice, performance and operational tips, and the points most often tested on the Analytics Engineer exam, all grounded in the official documentation.

relationships Test Fundamentals and Failure Patterns

relationships verifies that values in a child table's column exist in the parent table's column. NULLs on the child side are ignored by default, and only cases where the child has a value that is missing from the parent are flagged as violations.

dbt generates a test query that returns only the diff, so tests are read-only and do not create any physical constraints in the warehouse.

Purpose: enforce foreign-key-equivalent referential integrity via a logical test
Targets: a specific column on the child table → the primary (or candidate) key on the parent table
NULL handling: NULLs on the child side are excluded from violations (the generally intended behavior)

Test type	What it checks	Typical failure example
relationships	Referential integrity	fact.customer_id=999 does not exist in dim
not_null	Missing values	fact.customer_id is NULL
unique	Uniqueness	dim.customer_id has duplicates

Basic relationship at a glance

Minimal schema.yml example

version: 2

models:
  - name: fact_orders
    columns:
      - name: customer_id
        tests:
          - relationships:
              to: ref('dim_customers')
              field: id

Definition Patterns and Practical Filtering Tips

When you want to scope tests by date range or status, the where setting on generic tests is the right tool. Switching the predicate per environment lets you apply production-grade integrity checks to small samples in dev as well.

Composite-key referential integrity is not directly supported by relationships alone. Practical approaches are to build a concatenated key on both parent and child, or to write a single singular SQL test using NOT EXISTS.

Limit test scope with where (e.g., last 90 days, exclude cancellations)
Collapse composite keys into a surrogate key, or handle them with a singular test
When renaming columns, watch out for forgetting to update the field side

Setting	Meaning	Watch out for
to	Specifies the referenced model	Normally use ref('parent_model')
field	Column name on the referenced model	Must match the key column on the parent
where	Filter condition on the child side	Shrink heavy tables with a date-range filter

Flow for narrowing the child set with where

relationships combined with where

version: 2

models:
  - name: fact_orders
    columns:
      - name: customer_id
        tests:
          - relationships:
              to: ref('dim_customers')
              field: id
              where: "order_date >= dateadd(day, -90, current_date) AND order_status <> 'CANCELLED'"

Performance Optimization and Warehouse-Specific Tips

Internally, relationships compiles to a query equivalent to NOT EXISTS or LEFT ANTI JOIN. To speed it up on large data, focus on the physical design of the join key columns and on filtering.

Combine warehouse-specific optimizations such as clustering keys on Snowflake and Z-Ordering on Databricks to cut test runtime.

Add statistics and clustering on the join key columns of both parent and child
Reduce volume with where via date ranges or logical filters
Catch duplicate primary keys on the parent first with the unique test to avoid false positives

Platform	Recommended tuning	Notes
Snowflake	Clustering keys on parent.id / child.customer_id	Mind the micro-partition distribution
Databricks	Z-ORDER on the join key columns	Improves data-skipping efficiency on Delta tables
Common	Date-range filter + reference only the necessary columns	Minimize selected columns to reduce I/O

Conceptual logical plan during the test (abstracted)

Sample SQL for validating composite keys with a singular test

-- tests/relationships_orders_customers.sql
-- Child: fact_orders(order_id, customer_id)
-- Parent: dim_customers(customer_id, tenant_id)

select f.order_id, f.customer_id, f.tenant_id
from {{ ref('fact_orders') }} f
where f.customer_id is not null
  and not exists (
    select 1
    from {{ ref('dim_customers') }} d
    where d.customer_id = f.customer_id
      and d.tenant_id = f.tenant_id
  )

Handling Failures, Storing Them, and Operational Flow

store_failures is handy for investigating failing rows. It persists violating records to a dedicated schema when tests run, making reproduction and sharing easy.

Severity is controlled with the severity setting. A realistic pattern is warn in development and error as a gate in production pipelines.

Persist failing rows with store_failures: true
Switch severity per environment to build a staged gating strategy
Run models and tests together with dbt build and wire it into CI

Item	Example	Behavior
severity	warn / error	warn continues with a warning; error halts on failure
store_failures	true	Saves violating records to a table
Selective run	dbt test -s test_type:generic	Runs only generic tests

Operational flow for failing records

Example of switching severity and failure storage per environment

# Example tests defaults in dbt_project.yml
tests:
  +severity: "{{ target.name == 'prod' and 'error' or 'warn' }}"
  +store_failures: "{{ target.name == 'prod' }}"

Exam Prep and Practical Checklist

On the Analytics Engineer exam, common topics include the purpose of relationships, NULL handling, how it splits responsibilities with unique and not_null, and when to choose a singular test instead.

In practice, a stable order is to guarantee parent key health first with unique + not_null, then run relationships on top of that.

Test order: parent gets unique + not_null, child gets relationships
For composite keys, use a surrogate key or a singular SQL test
Make it clear that tests are queries, not constraints (no foreign keys are created in the warehouse)

Angle	Frequently tested point	Summary
Purpose	What relationships guarantees	Child values also exist in the parent
NULL	Are child-side NULLs failures?	Ignored by default (not failed)
Alternatives	What to do with composite keys	Singular tests or a concatenated surrogate key

How model layers and tests are laid out

Example of collapsing relationships into a single column via a surrogate key

-- Build a surrogate key with identical logic on both parent and child
-- parent_surrogate_key = md5(coalesce(customer_id,'') || '|' || coalesce(tenant_id,''))

-- schema.yml excerpt
models:
  - name: fact_orders
    columns:
      - name: customer_sk
        tests:
          - relationships:
              to: ref('dim_customers')
              field: customer_sk

Check Your Understanding

Analytics Engineer

問題 1

You want to use a dbt generic test to confirm that every value in fact_orders.customer_id exists in dim_customers.id. Which test is the most appropriate?

not_null
unique
relationships
accepted_values

正解: C

relationships verifies that values in the child column exist in the specified column of the referenced model. not_null checks for missing values, unique checks uniqueness, and accepted_values checks an allowed-value set — none of those directly guarantee referential integrity.

Frequently Asked Questions

Does the relationships test create a foreign key constraint in the warehouse?

No. dbt tests run a query that surfaces violating rows; they do not create any physical constraints in the warehouse.

Does relationships fail when the child column is NULL?

By default, NULLs on the child side are not treated as violations. If you also need to forbid NULLs, combine it with not_null.

How should I validate referential integrity for composite keys?

relationships assumes a single column. For composite keys, either build a surrogate key with identical logic on both parent and child and apply relationships to it, or write a singular test using NOT EXISTS.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Validating Referential Integrity with the dbt relationships Test