dbt

dbt Sources and Freshness: Declaring External Tables and Monitoring Data Freshness

2026-04-19
NicheeLab Editorial Team

dbt Sources declare upstream tables that dbt does not create itself, pulling them into the dependency graph so they can be tested and monitored for freshness.

This article walks through declaring external tables and designing freshness checks with a hands-on lens, focused on common environments like Snowflake and Databricks and the exam topics they generate.

Sources: Fundamentals and Design Intent

dbt Sources let you explicitly declare upstream tables or views loaded by EL tools like Fivetran inside your dbt project. That declaration unlocks stable references, documentation, testing, and freshness monitoring.

Declarations typically live in a yml file (version: 2) under the models directory. Use name, database, schema, and identifier to pinpoint the physical object. Casing and quoting behavior differs by adapter, so set quoting as needed.

  • Sources are declarations that let you reference 'external' (non-dbt-managed) objects through stable logical names.
  • Use identifier to pinpoint the physical table name and name as the logical handle that models call.
  • Declare only the tables you actually use and prune any that fall out of use.
  • In environments like Snowflake where uppercase is common, mind whether quoting is enabled.
Itemsource ({{ source() }})ref ({{ ref() }})seed
Object createdReferences an existing table onlyReferences a model dbt createsCreated by dbt from a CSV
How to reference{{ source('src','tbl') }}{{ ref('model_name') }}{{ ref('seed_name') }}
Dependency graphVisualized as an upstream entry nodeVisualizes dependencies among downstream modelsVisualized starting from the seed
Quality controlsTests yes, freshness yesTests yes, freshness noTests yes, freshness no
Typical useResults loaded from external systemsReusing the transformation layerSmall master data or test fixtures

Minimal sources.yml example

version: 2
sources:
  - name: raw
    database: ANALYTICS
    schema: RAW
    loader: fivetran
    tables:
      - name: orders
        identifier: ORDERS
      - name: customers
        identifier: CUSTOMERS

External Table Declaration Patterns (Snowflake / Databricks)

Adapters interpret database, schema, and casing differently. Snowflake uses a three-part name of database, schema, and object, where case sensitivity depends on quoting. Databricks with Unity Catalog enabled uses catalog, schema, and table, and most implementations map dbt's database field to the catalog (consult your adapter's docs for specifics).

In practice, give external tables with fixed physical names a physical identifier and a logical name, and set quoting explicitly when needed. This minimizes the blast radius of environment differences and renames.

  • Watch quoting carefully on Snowflake. When physical names are locked to uppercase, enabling quoting is the safer choice.
  • On Databricks Unity Catalog, the common pattern is to set database to the catalog and schema to the schema.
  • Keep identifier and name separate to absorb upstream schema moves and renames.

Snowflake and Databricks declaration examples (when listed in the same file)

# version: 2 はファイル先頭に一度だけ宣言
version: 2
sources:
  # Snowflake 例
  - name: raw_snowflake
    database: PROD
    schema: EXT
    quoting:
      database: true
      schema: true
      identifier: true
    tables:
      - name: clickstream
        identifier: EXT_CLICKSTREAM
        description: "外部取り込みのクリックストリーム"

  # Databricks (Unity Catalog) 例
  - name: raw_uc
    database: main_catalog        # 多くのアダプタで catalog に相当
    schema: bronze
    tables:
      - name: events
        identifier: events        # 物理名と同一のケース

Freshness: Core Settings and How Evaluation Works

Freshness evaluates the difference (age) between the maximum of loaded_at_field and the run time, then judges pass/warn/error against the warn_after and error_after thresholds. loaded_at_field is expected to be a timestamp column capturing the per-row load completion time.

filter is an extra predicate that narrows the rows considered when evaluating loaded_at_field, expressed as a fragment in the adapter's SQL dialect. It is useful for excluding test data or soft-deleted rows.

  • Status thresholds: pass < warn_after < error_after
  • The evaluation timestamp comes from the target database's current time.
  • loaded_at_field should not contain NULLs; exclude them with filter if needed.
  • Freshness is scoped per source table. Source-level settings act as defaults that tables inherit and can override.

Source and freshness evaluation flow (conceptual diagram)

Upstream ingestion (EL)External table (physical)dbt Source declarationloaded_at_field / freshnessdbt source freshnessmax(loaded_at_field) vs current timeOutputs pass / warn / errorUpstream ingestion → External table → dbt Source → Freshness evaluation → pass/warn/error

Freshness configuration and execution example

version: 2
sources:
  - name: raw
    schema: RAW
    freshness:                      # デフォルト(テーブルで上書き可)
      warn_after: {count: 24, period: hour}
      error_after: {count: 48, period: hour}
    tables:
      - name: orders
        identifier: ORDERS
        loaded_at_field: _loaded_at
        freshness:
          warn_after: {count: 6, period: hour}    # このテーブルだけ厳しめ
          filter: "_is_test = false"              # 評価対象の行を絞る

# 実行コマンドの例(シェル)
# 全 Source の鮮度
#   dbt source freshness
# 特定 Source のみ
#   dbt source freshness --select source:raw.orders

Reading Results and Artifacts

After running dbt source freshness, the CLI prints a status summary. Even a single error returns a non-zero exit code. Warns do not affect the exit code, but it is common to send them to monitoring alerts.

The run produces target/sources.json as an artifact, recording max_loaded_at, snapshotted_at (measurement time), age, and status for each Source table. You can feed it into external systems or dashboards for visualization.

  • On error, the safe operational pattern is to halt the pipeline or suppress downstream builds.
  • Archive sources.json and track age over time to surface trends.
  • Manage operational exceptions, like temporarily relaxing thresholds during long-running bulk loads.

sources.json example (excerpt)

{
  "metadata": {"generated_at": "2026-04-19T12:34:56Z"},
  "sources": {
    "raw.orders": {
      "max_loaded_at": "2026-04-19T11:00:00Z",
      "snapshotted_at": "2026-04-19T12:34:56Z",
      "age": 1.58,
      "status": "pass"
    },
    "raw.clickstream": {
      "max_loaded_at": "2026-04-17T03:10:00Z",
      "snapshotted_at": "2026-04-19T12:34:56Z",
      "age": 57.41,
      "status": "error"
    }
  }
}

Operations Patterns: Integrating with CI and Schedules

Freshness is most effective when wired into scheduled runs or CI gates. A safe layout is to check freshness before running downstream dbt build and skip the rest on error.

When thresholds differ between environments (prod vs. staging), a common practical pattern is to externalize the warn_after count via a Jinja variable in the YAML.

  • Run dbt source freshness at the start of the job and halt on a non-zero exit.
  • Parameterize thresholds with Jinja or environment variables to absorb environment differences.
  • Track temporary delays as tickets and manage threshold changes through review.

CI job example (Bash) with a variable-ization snippet

# Bash: freshness が error なら以降の build を止める
set -e

# 依存取得とスキーマ検証(任意)
dbt deps

# 鮮度チェック(特定 Source のみ等も可)
dbt source freshness --select source:raw.*

# 問題なければ下流を実行
dbt build --select +marts:tag:daily

# YAML での変数化イメージ(Jinja)
# warn_after: {count: {{ var('freshness_hours', 24) }}, period: hour}

Exam Prep: Common Themes and Easy-to-Confuse Points

The dbt Analytics Engineer exam frequently asks about the difference between source and ref, the relationship between loaded_at_field and freshness, the role of filter, and how to use selectors. Remember that freshness is distinct from model tests and applies only to source tables.

Questions about adapter differences (Snowflake vs. Databricks) for database/schema interpretation and quoting are also common, so be ready to articulate the physical name resolution logic.

  • source points at existing tables; ref points at dbt models. Freshness applies only to sources.
  • loaded_at_field is evaluated by its max. You choose the period unit for warn_after and error_after.
  • filter restricts evaluated rows as a WHERE clause fragment and is useful for excluding NULLs.
  • Selectors: you can use source:<source_name>.<table_name>, e.g. source:raw.orders.

Example: referencing a Source from a model

-- models/stg_orders.sql(例)
select
  o.id as order_id,
  o.customer_id,
  o.status,
  o._loaded_at
from {{ source('raw', 'orders') }} as o
where o._is_test = false

Check Your Understanding

Analytics Engineer

問題 1

You want to monitor freshness on the upstream orders table in a dbt project. loaded_at_field is _loaded_at, warn at 24 hours, error at 48 hours, and you want to exclude test data rows (_is_test = true). Which configuration is correct?

  1. On the orders table in sources.yml, set loaded_at_field, freshness (warn_after/error_after), and filter: "_is_test = false".
  2. Add WHERE _is_test = false to the SELECT in models/stg_orders.sql and verify freshness via dbt test.
  3. Run dbt build before dbt run and set freshness on the model referenced via ref.
  4. Skip freshness in sources.yml and add tests: - freshness instead.

正解: A

Freshness can only be configured on Source tables, using loaded_at_field, warn_after/error_after, and optionally filter. Neither model tests nor ref evaluates freshness.

Frequently Asked Questions

What type and values does loaded_at_field need?

It needs to be a timestamp (datetime) column representing when each row finished loading. NULLs make age calculation unstable, so the practical approach is a NOT NULL constraint where possible, or excluding NULLs with filter when that is not feasible.

What syntax should be used for filter?

Provide a WHERE clause fragment (without the WHERE keyword) following the SQL dialect of the target database. For example, Snowflake accepts "_is_test = false" and Databricks/Spark accepts similar boolean expressions. Adjust identifier casing and quoting to match the adapter.

Can freshness defined at the source level be overridden per table?

Yes. Define a default freshness in the source block, then override it under a specific table. loaded_at_field can also be set per table.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.