A dbt model is the smallest unit of transformation, defined by a single SQL SELECT statement. The simple rule of one file = one model = one SELECT delivers a change-resilient DAG and a highly reproducible development experience.
This article organizes exam-relevant topics on a practical basis: model design fundamentals, choosing a materialization, managing dependencies with ref/source, incremental basics, and testing and operations.
In dbt, a model is a SELECT statement defined in a .sql file under the models directory. With Jinja templating, you combine ref, source, macros, variables, and config blocks to express environment-independent transformations.
A model is fundamentally a single SELECT; do not write DML directly, and delegate persistence to the materialization. Layered prefixes (stg_, int_, dim_, fct_, etc.) plus snake_case make models easier to manage. Exams often ask that models become DAG nodes and that materialization decides whether an object is actually created.
Minimal model example (view in the staging layer)
{{ config(materialized='view', tags=['staging']) }}
select
id,
user_id,
event_type,
event_ts
from {{ source('raw', 'events') }}
where event_ts >= dateadd(day, -30, current_timestamp)dbt models choose their persistence strategy through the materialized parameter. The main options are view, table, incremental, and ephemeral. Trade-offs include degree of materialization, refresh cost, execution speed, and whether downstream models can reference them.
Project-wide defaults are set in dbt_project.yml, and exceptions are declared per model via the config block. Common exam points: ephemeral creates no intermediate object and is inlined as a CTE into upstream models at compile time, while incremental performs partial updates.
| Type | Characteristics | Refresh Strategy | Storage Cost |
|---|---|---|---|
| view | Always reflects the latest data (virtual view) | Lightweight recreation; no physical data stored | Low |
| table | Creates a physical table | Full rebuild from scratch | Medium to high |
| incremental | Applies diffs (partial updates) | Loads only new/updated rows (MERGE, etc.) | Medium |
| ephemeral | No object created (inlined as a CTE) | N/A | Zero |
Specifying materialization (per model and at the project level)
-- Per-model override inside the model file
{{ config(materialized='incremental', unique_key='id') }}
select * from {{ ref('stg_orders') }}
# Set defaults per layer in dbt_project.yml
models:
my_project:
+materialized: table
staging:
+materialized: view
marts:
+materialized: tableref references another model by its logical name; at compile time it resolves to the correct schema and object name and registers the dependency in the DAG. source declaratively brings in external raw-data objects. Together they absorb environment differences and ensure builds run in the correct order.
dbt build runs models, tests, snapshots, and seeds together in dependency order. Combined with selectors (--select/--exclude), you can safely build only the affected scope.
Conceptual view of the model DAG
Declaring source and combining ref/source
# models/sources.yml
version: 2
sources:
- name: raw
database: RAW_DB
schema: RAW_SCHEMA
tables:
- name: orders
- name: users
-- models/stg_orders.sql
{{ config(materialized='view', tags=['staging']) }}
select *
from {{ source('raw', 'orders') }}
-- models/fct_orders.sql
{{ config(materialized='table', tags=['marts']) }}
select
o.id,
o.user_id,
u.country,
o.total_amount,
o.order_ts
from {{ ref('int_orders_enriched') }} o
join {{ ref('dim_users') }} u on o.user_id = u.user_idincremental is a partial-update strategy for keeping the recompute cost of large tables manageable. Identify new and updated rows via unique_key or a partition column, and branch first-run vs. subsequent-run logic with is_incremental(). The actual diff-application method (MERGE, INSERT OVERWRITE, etc.) depends on the adapter implementation.
When the schema may change, design the on_schema_change behavior. To rebuild everything, run with --full-refresh. Exams frequently quiz on incremental keywords such as 'partial update,' 'full build on first run,' and 'update detection via unique_key.'
A typical incremental model pattern
{{ config(
materialized='incremental',
unique_key='id',
on_schema_change='sync_all_columns'
) }}
with src as (
select * from {{ ref('stg_orders') }}
{% if is_incremental() %}
where order_ts > (select coalesce(max(order_ts), '1900-01-01') from {{ this }})
{% endif %}
),
calc as (
select id, user_id, total_amount, order_ts, total_amount * 0.1 as tax
from src
)
select * from calcModel quality is tested declaratively via schema files (yml). On top of generic tests like not_null, unique, and relationships, you can write singular SQL tests when needed. dbt build runs models and tests in dependency order.
Write column descriptions and source explanations in yml and browse them via dbt docs generate/serve. Descriptions become valuable assets for both future maintainers and exam preparation.
Schema test examples (generic and singular)
# models/marts/schema.yml
version: 2
models:
- name: fct_orders
description: Order fact table
columns:
- name: id
tests: [not_null, unique]
- name: user_id
tests:
- relationships:
to: ref('dim_users')
field: user_id
# tests/high_value_orders.sql (singular test)
select 1
where exists (
select 1 from {{ ref('fct_orders') }} where total_amount < 0
)dbt run executes models, while dbt build executes models, tests, snapshots, and seeds in dependency order. Use --select/--exclude to scope the run, and leverage tags, paths, and dependency-graph operators (+ and @). With state:modified you can select based on change detection.
For performance, distribute aggregations and filters to the earliest reasonable layer (light work in staging, normalization and aggregation in intermediate, business logic in marts). Manage warehouse-specific partition and cluster-key settings via the model's DDL options (adapter-dependent config).
Representative selector examples
# Build changed models and their downstream
dbt build --select state:modified+
# Run a model along with its upstream
dbt run --select +int_orders_enriched
# Only models tagged with both tags
dbt run --select tag:marts,tag:critical
# Re-run failed nodes
dbt build --select result:errorAnalytics Engineer
問題 1
You want to create a small helper transformation used by only one downstream model, avoiding any intermediate object in the warehouse to minimize cost. Which materialization is best for this helper?
正解: D
ephemeral creates no physical object and is inlined as a CTE into its parent model. With no intermediate table to create or clean up, it is ideal for helper transformations used by a single downstream model.
How can I rename a model while minimizing impact on downstream models?
If you use ref, build order is automatically adjusted through logical-name resolution. When renaming, use git diffs and the state:modified selector to build the affected scope, and confirm that relationship tests pass with dbt test. If needed, you can migrate gradually by keeping the old physical name through an alias.
What if columns are added or removed in an incremental model?
Use the on_schema_change config. Behaviors such as sync_all_columns are adapter-dependent. For breaking changes or type changes, prefer --full-refresh to prioritize safe migration.
How do I handle late-arriving data from sources?
Define a unique_key, then extend the ingestion condition inside is_incremental() to include a window slightly before the max timestamp, or configure an appropriate MERGE condition. Combine with periodic full refreshes or window-recompute batches as needed.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...