dbt

dbt Model Basics: The Smallest Unit of Transformation Defined in SQL

2026-04-19
NicheeLab Editorial Team

A dbt model is the smallest unit of transformation, defined by a single SQL SELECT statement. The simple rule of one file = one model = one SELECT delivers a change-resilient DAG and a highly reproducible development experience.

This article organizes exam-relevant topics on a practical basis: model design fundamentals, choosing a materialization, managing dependencies with ref/source, incremental basics, and testing and operations.

Definition and Role of a dbt Model

In dbt, a model is a SELECT statement defined in a .sql file under the models directory. With Jinja templating, you combine ref, source, macros, variables, and config blocks to express environment-independent transformations.

A model is fundamentally a single SELECT; do not write DML directly, and delegate persistence to the materialization. Layered prefixes (stg_, int_, dim_, fct_, etc.) plus snake_case make models easier to manage. Exams often ask that models become DAG nodes and that materialization decides whether an object is actually created.

  • 1 model = 1 SELECT. CTEs are fine; do not write INSERT/UPDATE/DELETE (persistence is delegated to materialized).
  • Use the Jinja config block to declare materialized, tags, alias, schema, and more.
  • At compile time it is expanded into SQL targeted at the adapter (Snowflake, Databricks, etc.).

Minimal model example (view in the staging layer)

{{ config(materialized='view', tags=['staging']) }}

select
  id,
  user_id,
  event_type,
  event_ts
from {{ source('raw', 'events') }}
where event_ts >= dateadd(day, -30, current_timestamp)

Materialization Design Basics

dbt models choose their persistence strategy through the materialized parameter. The main options are view, table, incremental, and ephemeral. Trade-offs include degree of materialization, refresh cost, execution speed, and whether downstream models can reference them.

Project-wide defaults are set in dbt_project.yml, and exceptions are declared per model via the config block. Common exam points: ephemeral creates no intermediate object and is inlined as a CTE into upstream models at compile time, while incremental performs partial updates.

  • Defaults depend on the adapter and project settings; override per model when needed.
  • A safe operational pattern is to use view during development and promote to table or incremental once it stabilizes in production.
TypeCharacteristicsRefresh StrategyStorage Cost
viewAlways reflects the latest data (virtual view)Lightweight recreation; no physical data storedLow
tableCreates a physical tableFull rebuild from scratchMedium to high
incrementalApplies diffs (partial updates)Loads only new/updated rows (MERGE, etc.)Medium
ephemeralNo object created (inlined as a CTE)N/AZero

Specifying materialization (per model and at the project level)

-- Per-model override inside the model file
{{ config(materialized='incremental', unique_key='id') }}
select * from {{ ref('stg_orders') }}

# Set defaults per layer in dbt_project.yml
models:
  my_project:
    +materialized: table
    staging:
      +materialized: view
    marts:
      +materialized: table

Managing Dependencies with ref and source

ref references another model by its logical name; at compile time it resolves to the correct schema and object name and registers the dependency in the DAG. source declaratively brings in external raw-data objects. Together they absorb environment differences and ensure builds run in the correct order.

dbt build runs models, tests, snapshots, and seeds together in dependency order. Combined with selectors (--select/--exclude), you can safely build only the affected scope.

  • Declare source in yml and reference it as source('source_name', 'table_name').
  • ref absorbs model renames and schema differences; avoid hand-written fully qualified names.

Conceptual view of the model DAG

source(raw.orders)stg_ordersint_orders_enricheddim_usersfct_ordersConceptual view of the model DAG

Declaring source and combining ref/source

# models/sources.yml
version: 2
sources:
  - name: raw
    database: RAW_DB
    schema: RAW_SCHEMA
    tables:
      - name: orders
      - name: users

-- models/stg_orders.sql
{{ config(materialized='view', tags=['staging']) }}
select *
from {{ source('raw', 'orders') }}

-- models/fct_orders.sql
{{ config(materialized='table', tags=['marts']) }}
select
  o.id,
  o.user_id,
  u.country,
  o.total_amount,
  o.order_ts
from {{ ref('int_orders_enriched') }} o
join {{ ref('dim_users') }} u on o.user_id = u.user_id

Incremental Model Basics

incremental is a partial-update strategy for keeping the recompute cost of large tables manageable. Identify new and updated rows via unique_key or a partition column, and branch first-run vs. subsequent-run logic with is_incremental(). The actual diff-application method (MERGE, INSERT OVERWRITE, etc.) depends on the adapter implementation.

When the schema may change, design the on_schema_change behavior. To rebuild everything, run with --full-refresh. Exams frequently quiz on incremental keywords such as 'partial update,' 'full build on first run,' and 'update detection via unique_key.'

  • unique_key is the foundation of update detection; without it, runs usually become insert-only.
  • Use is_incremental() to apply filters only on incremental runs and narrow the ingestion scope.

A typical incremental model pattern

{{ config(
  materialized='incremental',
  unique_key='id',
  on_schema_change='sync_all_columns'
) }}

with src as (
  select * from {{ ref('stg_orders') }}
  {% if is_incremental() %}
    where order_ts > (select coalesce(max(order_ts), '1900-01-01') from {{ this }})
  {% endif %}
),
calc as (
  select id, user_id, total_amount, order_ts, total_amount * 0.1 as tax
  from src
)
select * from calc

Model Testing and Documentation

Model quality is tested declaratively via schema files (yml). On top of generic tests like not_null, unique, and relationships, you can write singular SQL tests when needed. dbt build runs models and tests in dependency order.

Write column descriptions and source explanations in yml and browse them via dbt docs generate/serve. Descriptions become valuable assets for both future maintainers and exam preparation.

  • Tests catch model inconsistencies early on failure; running them in CI is recommended.
  • docs visualize the DAG and lineage; never skip column descriptions.

Schema test examples (generic and singular)

# models/marts/schema.yml
version: 2
models:
  - name: fct_orders
    description: Order fact table
    columns:
      - name: id
        tests: [not_null, unique]
      - name: user_id
        tests:
          - relationships:
              to: ref('dim_users')
              field: user_id

# tests/high_value_orders.sql (singular test)
select 1
where exists (
  select 1 from {{ ref('fct_orders') }} where total_amount < 0
)

Execution, Selection, and Performance Essentials

dbt run executes models, while dbt build executes models, tests, snapshots, and seeds in dependency order. Use --select/--exclude to scope the run, and leverage tags, paths, and dependency-graph operators (+ and @). With state:modified you can select based on change detection.

For performance, distribute aggregations and filters to the earliest reasonable layer (light work in staging, normalization and aggregation in intermediate, business logic in marts). Manage warehouse-specific partition and cluster-key settings via the model's DDL options (adapter-dependent config).

  • Run only the affected scope: dbt build --select state:modified+
  • Tag-driven: dbt run --select tag:staging; path-based: dbt run --select models/marts/
  • Full refresh: dbt run --full-refresh --select fct_orders

Representative selector examples

# Build changed models and their downstream
dbt build --select state:modified+

# Run a model along with its upstream
dbt run --select +int_orders_enriched

# Only models tagged with both tags
dbt run --select tag:marts,tag:critical

# Re-run failed nodes
dbt build --select result:error

Check Your Understanding

Analytics Engineer

問題 1

You want to create a small helper transformation used by only one downstream model, avoiding any intermediate object in the warehouse to minimize cost. Which materialization is best for this helper?

  1. view
  2. table
  3. incremental
  4. ephemeral

正解: D

ephemeral creates no physical object and is inlined as a CTE into its parent model. With no intermediate table to create or clean up, it is ideal for helper transformations used by a single downstream model.

Frequently Asked Questions

How can I rename a model while minimizing impact on downstream models?

If you use ref, build order is automatically adjusted through logical-name resolution. When renaming, use git diffs and the state:modified selector to build the affected scope, and confirm that relationship tests pass with dbt test. If needed, you can migrate gradually by keeping the old physical name through an alias.

What if columns are added or removed in an incremental model?

Use the on_schema_change config. Behaviors such as sync_all_columns are adapter-dependent. For breaking changes or type changes, prefer --full-refresh to prioritize safe migration.

How do I handle late-arriving data from sources?

Define a unique_key, then extend the ingestion condition inside is_incremental() to include a window slightly before the max timestamp, or configure an appropriate MERGE condition. Combine with periodic full refreshes or window-recompute batches as needed.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.