dbt

dbt ref() Function: Practical Guide to Dependency Resolution and Model References

2026-04-19
NicheeLab Editorial Team

ref() forms the DAG of a dbt project and automates both physical name resolution and build order. It is a fundamental topic that shows up often on the Analytics Engineer exam.

This article walks through ref() end-to-end: when it resolves, how it interacts with materializations, cross-package references, integration with seeds and snapshots, anti-patterns, and troubleshooting.

ref() Basics: References and DAG Dependency Resolution

At compile time, ref() resolves the supplied model name to an environment-specific physical name (database.schema.table/view) and adds an edge to the dependency graph. As a result, dbt build executes upstream-to-downstream in the correct order, and transitive dependencies are handled automatically.

Physical names are derived from the target (profile and target) and quoted according to the rules of the adapter (Snowflake, BigQuery, Databricks SQL, etc.). Because nothing is hard-coded, the same SQL works across dev and prod even when the schemas differ.

  • ref('model_name') references a model in the same package
  • ref('package_name', 'model_name') references a model in a dependent package
  • Writing ref() adds a dependency edge to the DAG, guaranteeing build order

Conceptual DAG produced by ref()

 [seed_orders] ---> [stg_orders] ----> \
                       ^                 \
                       |                  > [fct_orders]
 [stg_payments] -------/                 /
 [dim_customers] ----------------------->/

Basic example: fct_orders.sql

with orders as (
  select * from {{ ref('stg_orders') }}
),
payments as (
  select * from {{ ref('stg_payments') }}
)
select
  o.order_id,
  o.customer_id,
  sum(p.amount) as revenue
from orders o
left join payments p on p.order_id = o.order_id
group by 1,2;

Compile-Time Resolution and How Materializations Differ

ref() is resolved at compile time. For table, view, and incremental materializations, ref() expands to the corresponding relation name. For ephemeral models, however, dbt inlines the upstream SQL as a CTE inside the downstream model (no physical table is created).

This lets you reuse upstream SQL while still benefiting from optimization. That said, overusing ephemeral can bloat the resulting CTE chain and produce overly complex compiled SQL — watch the query plan and runtime, and consider materializing as a table or making it incremental when needed.

  • table/view/incremental: ref() resolves to a physical name
  • ephemeral: ref() is inlined as a CTE (no physical object is created)
  • Because names are not hard-coded before compile, the SQL is robust to per-environment differences

Ephemeral example: a helper model that is inlined

-- models/utils_joined.sql
{{ config(materialized='ephemeral') }}
select *
from {{ ref('stg_orders') }} o
join {{ ref('stg_payments') }} p on p.order_id = o.order_id;

-- models/fct_orders.sql(上記を参照)
select order_id, customer_id, sum(amount) as revenue
from {{ ref('utils_joined') }}
group by 1,2;

Cross-Package References and Avoiding Name Collisions

When reusing models across packages, declare the dependency in packages.yml and reference it explicitly with ref('package_name', 'model_name'). This avoids name collisions and makes it immediately clear which package the model belongs to.

In large monorepos where the same model name might exist in multiple packages, getting in the habit of using cross-package references is the safest choice. CI dependency resolution and caching both work correctly, leading to stable builds.

  • Pin the version in packages.yml to ensure reproducibility
  • Use the two-argument form of ref for cross-package references
  • Pair with naming conventions (such as prefixes) to remove ambiguity

Example configuration for cross-package ref

# packages.yml
packages:
  - package: org/payments_pkg
    version: 0.6.1

-- models/fct_orders.sql
select *
from {{ ref('core_pkg', 'stg_orders') }} o
left join {{ ref('payments_pkg', 'stg_payments') }} p
  on p.order_id = o.order_id;

Integration with seeds, snapshots, and tests

ref() works not only for models but also for seeds and snapshots. Seeds are static tables loaded from CSV; snapshots are history tables for SCD management. Referencing either with ref() pulls them into the DAG so they benefit from build ordering and name resolution.

Use source() to reference external raw tables. source() declaratively registers an existing raw table in the warehouse, enabling documentation and dependency management.

  • Use ref() for seeds and snapshots, source() for external sources
  • ref() also works inside singular tests (queries that return failing rows)
  • Renaming a model affects ref resolution — use alias if you only want to change the physical name

Singular test example: assert that revenue is not NULL

-- tests/test_revenue_not_null.sql
select 1
from {{ ref('fct_orders') }}
where revenue is null
limit 1;

Real-World Pitfalls and Performance Considerations

Hard-coding schema or table names is the most common anti-pattern. It does not survive environment differences, and because it never enters the DAG, you lose any build-order guarantees. Always prefer ref() and source().

Deeply nested ephemeral models can explode the size of the compiled SQL. Check the query plan and consider materializing important intermediate results (materialized='table' or incremental).

  • Stop using direct references and standardize on ref()/source()
  • Use alias for renames so the model name stays intact and references keep working
  • Avoid large chains of ephemeral models — materialize at boundaries
  • Consider whether direct references inside pre/post-hooks can be replaced with ref()

Bad vs. good example

-- 悪例: 物理名のハードコード(環境移行で破綻)
select * from PROD_ANALYTICS.core.stg_orders;

-- 良例: ref()で環境依存を解消し、DAGに参加
select * from {{ ref('stg_orders') }};

CLI Selectors and Troubleshooting

Running dbt build -s fct_orders automatically builds the upstream models reachable through ref() first. To run only the change-impacted scope, use the state-comparison selectors.

A typical error is Model not found for ref. Check for typos in the model name, disabled models, missing package installs, and selectors that exclude the target model.

  • Use dbt list -s to verify the selection before running build
  • Models with disabled: true are excluded from ref resolution
  • For cross-package refs, verify the install and version in packages.yml
  • Use alias for physical-name changes; logical-name changes require rewiring downstream references
TargetDAG registrationEnvironment portabilityTypical use case
ref()Yes (guarantees upstream-to-downstream order)High (resolved based on target)Referencing models, seeds, and snapshots
source()Yes (source to model)Medium (declaration is environment-agnostic, but the table is external)Referencing existing raw data tables
Direct DB referenceNoneLow (locked to a specific environment and naming)Ad-hoc investigation or temporary manual queries

Selector and execution examples

# 依存込みで特定モデルのみを実行
$ dbt build -s fct_orders

# 変更のあった下流を実行(state比較)
$ dbt build -s state:modified+  --state path/to/artifacts

# 参照解決の確認
$ dbt list -s fct_orders --output name

Check Your Understanding

Analytics Engineer

問題 1

Model fct_orders references {{ ref('stg_orders') }} and {{ ref('stg_payments') }}. Which describes the behavior when you run dbt build -s fct_orders?

  1. stg_orders and stg_payments are built first, and the physical names are resolved according to the target environment's schema
  2. Build order is not guaranteed, and fct_orders may run first
  3. Referenced models are always recreated as views, so the materialization setting is ignored
  4. Both dev and prod resolve to the same physical name

正解: A

ref() adds dependency edges to the DAG and builds upstream first. Physical name resolution depends on the target, so each environment expands to its own schema and database. Materialization settings are not ignored.

Frequently Asked Questions

When should I use ref() vs source()?

Use ref() to reference dbt-managed models, seeds, and snapshots. Use source() to declare and reference existing raw tables in your data warehouse. The standard pattern is to start with source() for raw data ingestion and chain downstream transformations together with ref().

I want to rename a model without breaking references. How should I do that?

If you only need to change the physical table name, use the model's alias (keep the model name the same). ref('old_model_name') will still resolve, and only the physical name changes. If you need to rename the logical model itself, either keep the old model around during a migration window, or update every downstream reference at once to keep the DAG intact.

Cross-package references are erroring out. What should I check?

Confirm that the package is installed in packages.yml with the correct version, and that you are using the two-argument form ref('package_name', 'model_name'). Also make sure the target model is not disabled or excluded by a selector.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.