dbt

Validating Referential Integrity with the dbt relationships Test

2026-04-19
NicheeLab Editorial Team

The dbt relationships test verifies that every value in a foreign-key column on the fact side also exists in the primary-key column on the dimension side. It is a key mechanism for preserving logical referential integrity even in designs that do not enforce physical foreign key constraints in the warehouse.

This article walks through how to define the test, when to apply it in practice, performance and operational tips, and the points most often tested on the Analytics Engineer exam, all grounded in the official documentation.

relationships Test Fundamentals and Failure Patterns

relationships verifies that values in a child table's column exist in the parent table's column. NULLs on the child side are ignored by default, and only cases where the child has a value that is missing from the parent are flagged as violations.

dbt generates a test query that returns only the diff, so tests are read-only and do not create any physical constraints in the warehouse.

  • Purpose: enforce foreign-key-equivalent referential integrity via a logical test
  • Targets: a specific column on the child table → the primary (or candidate) key on the parent table
  • NULL handling: NULLs on the child side are excluded from violations (the generally intended behavior)
Test typeWhat it checksTypical failure example
relationshipsReferential integrityfact.customer_id=999 does not exist in dim
not_nullMissing valuesfact.customer_id is NULL
uniqueUniquenessdim.customer_id has duplicates

Basic relationship at a glance

fact_orderscustomer_id / ...dim_customersid (PK-equivalent) / ...referenceBasic relationship at a glance

Minimal schema.yml example

version: 2

models:
  - name: fact_orders
    columns:
      - name: customer_id
        tests:
          - relationships:
              to: ref('dim_customers')
              field: id

Definition Patterns and Practical Filtering Tips

When you want to scope tests by date range or status, the where setting on generic tests is the right tool. Switching the predicate per environment lets you apply production-grade integrity checks to small samples in dev as well.

Composite-key referential integrity is not directly supported by relationships alone. Practical approaches are to build a concatenated key on both parent and child, or to write a single singular SQL test using NOT EXISTS.

  • Limit test scope with where (e.g., last 90 days, exclude cancellations)
  • Collapse composite keys into a surrogate key, or handle them with a singular test
  • When renaming columns, watch out for forgetting to update the field side
SettingMeaningWatch out for
toSpecifies the referenced modelNormally use ref('parent_model')
fieldColumn name on the referenced modelMust match the key column on the parent
whereFilter condition on the child sideShrink heavy tables with a date-range filter

Flow for narrowing the child set with where

fact_orderswhere(order_date >= current_date - 90)relationships testdim_customersFlow for narrowing the child set with where

relationships combined with where

version: 2

models:
  - name: fact_orders
    columns:
      - name: customer_id
        tests:
          - relationships:
              to: ref('dim_customers')
              field: id
              where: "order_date >= dateadd(day, -90, current_date) AND order_status <> 'CANCELLED'"

Performance Optimization and Warehouse-Specific Tips

Internally, relationships compiles to a query equivalent to NOT EXISTS or LEFT ANTI JOIN. To speed it up on large data, focus on the physical design of the join key columns and on filtering.

Combine warehouse-specific optimizations such as clustering keys on Snowflake and Z-Ordering on Databricks to cut test runtime.

  • Add statistics and clustering on the join key columns of both parent and child
  • Reduce volume with where via date ranges or logical filters
  • Catch duplicate primary keys on the parent first with the unique test to avoid false positives
PlatformRecommended tuningNotes
SnowflakeClustering keys on parent.id / child.customer_idMind the micro-partition distribution
DatabricksZ-ORDER on the join key columnsImproves data-skipping efficiency on Delta tables
CommonDate-range filter + reference only the necessary columnsMinimize selected columns to reduce I/O

Conceptual logical plan during the test (abstracted)

Child (fact)key, ...Parent (dim)key, ...Failing recordsExtract only violationsDetect keys missing from the parent via NOT EXISTS / ANTI JOIN

Sample SQL for validating composite keys with a singular test

-- tests/relationships_orders_customers.sql
-- Child: fact_orders(order_id, customer_id)
-- Parent: dim_customers(customer_id, tenant_id)

select f.order_id, f.customer_id, f.tenant_id
from {{ ref('fact_orders') }} f
where f.customer_id is not null
  and not exists (
    select 1
    from {{ ref('dim_customers') }} d
    where d.customer_id = f.customer_id
      and d.tenant_id = f.tenant_id
  )

Handling Failures, Storing Them, and Operational Flow

store_failures is handy for investigating failing rows. It persists violating records to a dedicated schema when tests run, making reproduction and sharing easy.

Severity is controlled with the severity setting. A realistic pattern is warn in development and error as a gate in production pipelines.

  • Persist failing rows with store_failures: true
  • Switch severity per environment to build a staged gating strategy
  • Run models and tests together with dbt build and wire it into CI
ItemExampleBehavior
severitywarn / errorwarn continues with a warning; error halts on failure
store_failurestrueSaves violating records to a table
Selective rundbt test -s test_type:genericRuns only generic tests

Operational flow for failing records

Modelsrelationshipsdbt testFailing records tablestoreData remediationinvestigate & fixOperational flow for failing records

Example of switching severity and failure storage per environment

# Example tests defaults in dbt_project.yml
tests:
  +severity: "{{ target.name == 'prod' and 'error' or 'warn' }}"
  +store_failures: "{{ target.name == 'prod' }}"

Exam Prep and Practical Checklist

On the Analytics Engineer exam, common topics include the purpose of relationships, NULL handling, how it splits responsibilities with unique and not_null, and when to choose a singular test instead.

In practice, a stable order is to guarantee parent key health first with unique + not_null, then run relationships on top of that.

  • Test order: parent gets unique + not_null, child gets relationships
  • For composite keys, use a surrogate key or a singular SQL test
  • Make it clear that tests are queries, not constraints (no foreign keys are created in the warehouse)
AngleFrequently tested pointSummary
PurposeWhat relationships guaranteesChild values also exist in the parent
NULLAre child-side NULLs failures?Ignored by default (not failed)
AlternativesWhat to do with composite keysSingular tests or a concatenated surrogate key

How model layers and tests are laid out

sourcesstagingdim_*fact_*testsgeneric / singularrelationships checks referential integrity between dim and fact

Example of collapsing relationships into a single column via a surrogate key

-- Build a surrogate key with identical logic on both parent and child
-- parent_surrogate_key = md5(coalesce(customer_id,'') || '|' || coalesce(tenant_id,''))

-- schema.yml excerpt
models:
  - name: fact_orders
    columns:
      - name: customer_sk
        tests:
          - relationships:
              to: ref('dim_customers')
              field: customer_sk

Check Your Understanding

Analytics Engineer

問題 1

You want to use a dbt generic test to confirm that every value in fact_orders.customer_id exists in dim_customers.id. Which test is the most appropriate?

  1. not_null
  2. unique
  3. relationships
  4. accepted_values

正解: C

relationships verifies that values in the child column exist in the specified column of the referenced model. not_null checks for missing values, unique checks uniqueness, and accepted_values checks an allowed-value set — none of those directly guarantee referential integrity.

Frequently Asked Questions

Does the relationships test create a foreign key constraint in the warehouse?

No. dbt tests run a query that surfaces violating rows; they do not create any physical constraints in the warehouse.

Does relationships fail when the child column is NULL?

By default, NULLs on the child side are not treated as violations. If you also need to forbid NULLs, combine it with not_null.

How should I validate referential integrity for composite keys?

relationships assumes a single column. For composite keys, either build a surrogate key with identical logic on both parent and child and apply relationships to it, or write a singular test using NOT EXISTS.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.