ref() forms the DAG of a dbt project and automates both physical name resolution and build order. It is a fundamental topic that shows up often on the Analytics Engineer exam.
This article walks through ref() end-to-end: when it resolves, how it interacts with materializations, cross-package references, integration with seeds and snapshots, anti-patterns, and troubleshooting.
At compile time, ref() resolves the supplied model name to an environment-specific physical name (database.schema.table/view) and adds an edge to the dependency graph. As a result, dbt build executes upstream-to-downstream in the correct order, and transitive dependencies are handled automatically.
Physical names are derived from the target (profile and target) and quoted according to the rules of the adapter (Snowflake, BigQuery, Databricks SQL, etc.). Because nothing is hard-coded, the same SQL works across dev and prod even when the schemas differ.
Conceptual DAG produced by ref()
[seed_orders] ---> [stg_orders] ----> \
^ \
| > [fct_orders]
[stg_payments] -------/ /
[dim_customers] ----------------------->/Basic example: fct_orders.sql
with orders as (
select * from {{ ref('stg_orders') }}
),
payments as (
select * from {{ ref('stg_payments') }}
)
select
o.order_id,
o.customer_id,
sum(p.amount) as revenue
from orders o
left join payments p on p.order_id = o.order_id
group by 1,2;ref() is resolved at compile time. For table, view, and incremental materializations, ref() expands to the corresponding relation name. For ephemeral models, however, dbt inlines the upstream SQL as a CTE inside the downstream model (no physical table is created).
This lets you reuse upstream SQL while still benefiting from optimization. That said, overusing ephemeral can bloat the resulting CTE chain and produce overly complex compiled SQL — watch the query plan and runtime, and consider materializing as a table or making it incremental when needed.
Ephemeral example: a helper model that is inlined
-- models/utils_joined.sql
{{ config(materialized='ephemeral') }}
select *
from {{ ref('stg_orders') }} o
join {{ ref('stg_payments') }} p on p.order_id = o.order_id;
-- models/fct_orders.sql(上記を参照)
select order_id, customer_id, sum(amount) as revenue
from {{ ref('utils_joined') }}
group by 1,2;When reusing models across packages, declare the dependency in packages.yml and reference it explicitly with ref('package_name', 'model_name'). This avoids name collisions and makes it immediately clear which package the model belongs to.
In large monorepos where the same model name might exist in multiple packages, getting in the habit of using cross-package references is the safest choice. CI dependency resolution and caching both work correctly, leading to stable builds.
Example configuration for cross-package ref
# packages.yml
packages:
- package: org/payments_pkg
version: 0.6.1
-- models/fct_orders.sql
select *
from {{ ref('core_pkg', 'stg_orders') }} o
left join {{ ref('payments_pkg', 'stg_payments') }} p
on p.order_id = o.order_id;ref() works not only for models but also for seeds and snapshots. Seeds are static tables loaded from CSV; snapshots are history tables for SCD management. Referencing either with ref() pulls them into the DAG so they benefit from build ordering and name resolution.
Use source() to reference external raw tables. source() declaratively registers an existing raw table in the warehouse, enabling documentation and dependency management.
Singular test example: assert that revenue is not NULL
-- tests/test_revenue_not_null.sql
select 1
from {{ ref('fct_orders') }}
where revenue is null
limit 1;Hard-coding schema or table names is the most common anti-pattern. It does not survive environment differences, and because it never enters the DAG, you lose any build-order guarantees. Always prefer ref() and source().
Deeply nested ephemeral models can explode the size of the compiled SQL. Check the query plan and consider materializing important intermediate results (materialized='table' or incremental).
Bad vs. good example
-- 悪例: 物理名のハードコード(環境移行で破綻)
select * from PROD_ANALYTICS.core.stg_orders;
-- 良例: ref()で環境依存を解消し、DAGに参加
select * from {{ ref('stg_orders') }};Running dbt build -s fct_orders automatically builds the upstream models reachable through ref() first. To run only the change-impacted scope, use the state-comparison selectors.
A typical error is Model not found for ref. Check for typos in the model name, disabled models, missing package installs, and selectors that exclude the target model.
| Target | DAG registration | Environment portability | Typical use case |
|---|---|---|---|
| ref() | Yes (guarantees upstream-to-downstream order) | High (resolved based on target) | Referencing models, seeds, and snapshots |
| source() | Yes (source to model) | Medium (declaration is environment-agnostic, but the table is external) | Referencing existing raw data tables |
| Direct DB reference | None | Low (locked to a specific environment and naming) | Ad-hoc investigation or temporary manual queries |
Selector and execution examples
# 依存込みで特定モデルのみを実行
$ dbt build -s fct_orders
# 変更のあった下流を実行(state比較)
$ dbt build -s state:modified+ --state path/to/artifacts
# 参照解決の確認
$ dbt list -s fct_orders --output nameAnalytics Engineer
問題 1
Model fct_orders references {{ ref('stg_orders') }} and {{ ref('stg_payments') }}. Which describes the behavior when you run dbt build -s fct_orders?
正解: A
ref() adds dependency edges to the DAG and builds upstream first. Physical name resolution depends on the target, so each environment expands to its own schema and database. Materialization settings are not ignored.
When should I use ref() vs source()?
Use ref() to reference dbt-managed models, seeds, and snapshots. Use source() to declare and reference existing raw tables in your data warehouse. The standard pattern is to start with source() for raw data ingestion and chain downstream transformations together with ref().
I want to rename a model without breaking references. How should I do that?
If you only need to change the physical table name, use the model's alias (keep the model name the same). ref('old_model_name') will still resolve, and only the physical name changes. If you need to rename the logical model itself, either keep the old model around during a migration window, or update every downstream reference at once to keep the DAG intact.
Cross-package references are erroring out. What should I check?
Confirm that the package is installed in packages.yml with the correct version, and that you are using the two-argument form ref('package_name', 'model_name'). Also make sure the target model is not disabled or excluded by a selector.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...