dbt Sources declare upstream tables that dbt does not create itself, pulling them into the dependency graph so they can be tested and monitored for freshness.
This article walks through declaring external tables and designing freshness checks with a hands-on lens, focused on common environments like Snowflake and Databricks and the exam topics they generate.
dbt Sources let you explicitly declare upstream tables or views loaded by EL tools like Fivetran inside your dbt project. That declaration unlocks stable references, documentation, testing, and freshness monitoring.
Declarations typically live in a yml file (version: 2) under the models directory. Use name, database, schema, and identifier to pinpoint the physical object. Casing and quoting behavior differs by adapter, so set quoting as needed.
| Item | source ({{ source() }}) | ref ({{ ref() }}) | seed |
|---|---|---|---|
| Object created | References an existing table only | References a model dbt creates | Created by dbt from a CSV |
| How to reference | {{ source('src','tbl') }} | {{ ref('model_name') }} | {{ ref('seed_name') }} |
| Dependency graph | Visualized as an upstream entry node | Visualizes dependencies among downstream models | Visualized starting from the seed |
| Quality controls | Tests yes, freshness yes | Tests yes, freshness no | Tests yes, freshness no |
| Typical use | Results loaded from external systems | Reusing the transformation layer | Small master data or test fixtures |
Minimal sources.yml example
version: 2
sources:
- name: raw
database: ANALYTICS
schema: RAW
loader: fivetran
tables:
- name: orders
identifier: ORDERS
- name: customers
identifier: CUSTOMERS
Adapters interpret database, schema, and casing differently. Snowflake uses a three-part name of database, schema, and object, where case sensitivity depends on quoting. Databricks with Unity Catalog enabled uses catalog, schema, and table, and most implementations map dbt's database field to the catalog (consult your adapter's docs for specifics).
In practice, give external tables with fixed physical names a physical identifier and a logical name, and set quoting explicitly when needed. This minimizes the blast radius of environment differences and renames.
Snowflake and Databricks declaration examples (when listed in the same file)
# version: 2 はファイル先頭に一度だけ宣言
version: 2
sources:
# Snowflake 例
- name: raw_snowflake
database: PROD
schema: EXT
quoting:
database: true
schema: true
identifier: true
tables:
- name: clickstream
identifier: EXT_CLICKSTREAM
description: "外部取り込みのクリックストリーム"
# Databricks (Unity Catalog) 例
- name: raw_uc
database: main_catalog # 多くのアダプタで catalog に相当
schema: bronze
tables:
- name: events
identifier: events # 物理名と同一のケース
Freshness evaluates the difference (age) between the maximum of loaded_at_field and the run time, then judges pass/warn/error against the warn_after and error_after thresholds. loaded_at_field is expected to be a timestamp column capturing the per-row load completion time.
filter is an extra predicate that narrows the rows considered when evaluating loaded_at_field, expressed as a fragment in the adapter's SQL dialect. It is useful for excluding test data or soft-deleted rows.
Source and freshness evaluation flow (conceptual diagram)
Freshness configuration and execution example
version: 2
sources:
- name: raw
schema: RAW
freshness: # デフォルト(テーブルで上書き可)
warn_after: {count: 24, period: hour}
error_after: {count: 48, period: hour}
tables:
- name: orders
identifier: ORDERS
loaded_at_field: _loaded_at
freshness:
warn_after: {count: 6, period: hour} # このテーブルだけ厳しめ
filter: "_is_test = false" # 評価対象の行を絞る
# 実行コマンドの例(シェル)
# 全 Source の鮮度
# dbt source freshness
# 特定 Source のみ
# dbt source freshness --select source:raw.orders
After running dbt source freshness, the CLI prints a status summary. Even a single error returns a non-zero exit code. Warns do not affect the exit code, but it is common to send them to monitoring alerts.
The run produces target/sources.json as an artifact, recording max_loaded_at, snapshotted_at (measurement time), age, and status for each Source table. You can feed it into external systems or dashboards for visualization.
sources.json example (excerpt)
{
"metadata": {"generated_at": "2026-04-19T12:34:56Z"},
"sources": {
"raw.orders": {
"max_loaded_at": "2026-04-19T11:00:00Z",
"snapshotted_at": "2026-04-19T12:34:56Z",
"age": 1.58,
"status": "pass"
},
"raw.clickstream": {
"max_loaded_at": "2026-04-17T03:10:00Z",
"snapshotted_at": "2026-04-19T12:34:56Z",
"age": 57.41,
"status": "error"
}
}
}
Freshness is most effective when wired into scheduled runs or CI gates. A safe layout is to check freshness before running downstream dbt build and skip the rest on error.
When thresholds differ between environments (prod vs. staging), a common practical pattern is to externalize the warn_after count via a Jinja variable in the YAML.
CI job example (Bash) with a variable-ization snippet
# Bash: freshness が error なら以降の build を止める
set -e
# 依存取得とスキーマ検証(任意)
dbt deps
# 鮮度チェック(特定 Source のみ等も可)
dbt source freshness --select source:raw.*
# 問題なければ下流を実行
dbt build --select +marts:tag:daily
# YAML での変数化イメージ(Jinja)
# warn_after: {count: {{ var('freshness_hours', 24) }}, period: hour}
The dbt Analytics Engineer exam frequently asks about the difference between source and ref, the relationship between loaded_at_field and freshness, the role of filter, and how to use selectors. Remember that freshness is distinct from model tests and applies only to source tables.
Questions about adapter differences (Snowflake vs. Databricks) for database/schema interpretation and quoting are also common, so be ready to articulate the physical name resolution logic.
Example: referencing a Source from a model
-- models/stg_orders.sql(例)
select
o.id as order_id,
o.customer_id,
o.status,
o._loaded_at
from {{ source('raw', 'orders') }} as o
where o._is_test = false
Analytics Engineer
問題 1
You want to monitor freshness on the upstream orders table in a dbt project. loaded_at_field is _loaded_at, warn at 24 hours, error at 48 hours, and you want to exclude test data rows (_is_test = true). Which configuration is correct?
正解: A
Freshness can only be configured on Source tables, using loaded_at_field, warn_after/error_after, and optionally filter. Neither model tests nor ref evaluates freshness.
What type and values does loaded_at_field need?
It needs to be a timestamp (datetime) column representing when each row finished loading. NULLs make age calculation unstable, so the practical approach is a NOT NULL constraint where possible, or excluding NULLs with filter when that is not feasible.
What syntax should be used for filter?
Provide a WHERE clause fragment (without the WHERE keyword) following the SQL dialect of the target database. For example, Snowflake accepts "_is_test = false" and Databricks/Spark accepts similar boolean expressions. Adjust identifier casing and quoting to match the adapter.
Can freshness defined at the source level be overridden per table?
Yes. Define a default freshness in the source block, then override it under a specific table. loaded_at_field can also be set per table.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...