dbt run time is heavily influenced by the DAG critical path, materialization choices, incremental strategy, selection, parallelism, and DWH-specific optimizations. Rather than optimizing blindly, the fastest route is to factor the problem and tackle each piece in turn.
This article focuses on stable features from the official dbt documentation and packs in both the principles likely to appear on exams and the settings that actually move the needle in practice. First, surface the bottlenecks. Then cut the work with materializations and selection. Finally, lift the floor with parallelism and DWH features.
Before optimizing, measure where the time is actually going. dbt emits run_results.json and manifest.json, which let you inspect each node's start/end timestamps and dependencies. Shortening heavy nodes on the critical path is what shrinks total wall-clock time the most.
Stay aware of the DAG's shape. In a fan-out diagram, the longest narrow chain tends to be the critical path, and large joins, sorts, or re-aggregations on that chain are the usual culprits. Narrow your target first, then change the material (materialization).
DAG and critical-path illustration
[src] ----> stg_orders ----> dim_customer ----> fct_sales ----> rpt_sales
\ \-> dim_product ----/
\-> stg_events ---------------------------> fct_engagement
Critical path: src → stg_orders → dim_customer → fct_sales → rpt_salesExtract top bottlenecks from run_results.json (example)
jq -r '.results[] | {node: .unique_id, time: (.execution_time|tonumber)} | select(.time!=null) | [.time, .node] | @tsv' target/run_results.json \
| sort -nr \
| head -n 10The single most effective lever is the materialization. Choosing between view and table — both of which trigger full recomputation — and deciding whether incremental can shrink work from the second run onward will dominate your run time. Incremental requires a consistent unique_key and strategy (MERGE, DELETE+INSERT, INSERT_OVERWRITE, etc., depending on the adapter).
The base pattern: huge facts should be incremental, widely referenced dimensions should be tables, and small, frequently changing shaping logic should be views or ephemeral. Exams ask about this principle as well.
| Materialization | Run-time tendency | Best fit for |
|---|---|---|
| view | Fastest to build (no physical table). Recomputed on each downstream query | Lightweight shaping, frequently changing logic, low reference count |
| table | Build cost is moderate (rebuilt each run). Downstream queries are fast and stable | Referenced by many downstream models, full-rebuild cost acceptable, mid-sized |
| incremental | Heavy on first run, fast afterward. Applies only the delta | Large facts, CDC, daily updates. Make the strategy explicit on top of a unique_key |
Safe incremental-model pattern (per-adapter strategy dispatch example)
{{
config(
materialized='incremental',
unique_key='order_id',
tags=['fact','critical'],
**(
{'incremental_strategy': 'merge'} if target.adapter in ['snowflake','databricks','spark'] else {}
)
)
}}
with src as (
select * from {{ ref('stg_orders') }}
{% if is_incremental() %}
where _ingested_at >= dateadd(day, -2, current_timestamp)
{% endif %}
)
select
order_id,
customer_id,
order_date,
total_amount
from srcAvoid running the full DAG every time and restrict execution to the minimum blast radius of your change. Define criteria in selectors.yml and combine state:modified with parent/child plus syntax — that is the standard pattern. Pairing --defer with --state lets you build the delta locally while depending on the production artifacts.
dbt build runs models, seeds, snapshots, and tests in one shot. If shrinking time is the goal, useful operational splits include running tests only on changes and pushing heavy data tests to overnight jobs.
selectors.yml and CLI examples
# selectors.yml
selectors:
- name: modified_and_downstream
definition:
union:
- method: state
value: modified
children: true
- method: tag
value: critical
# Build only the delta (defer to production)
dbt build --select selector:modified_and_downstream --defer --state target
# Shorten the dev loop further (skip tests)
dbt build --select selector:modified_and_downstream --exclude resource_type:test --defer --state targetHigher threads does not always mean faster. DWH concurrency limits, I/O bandwidth, and lock contention impose a ceiling. Start by measuring from a moderate parallelism level, and stay aware of DAG layers so you can split the critical path.
If intermediate-table writes are the bottleneck, replace them with ephemeral to cut writes. Conversely, if a heavy subquery is re-used by multiple downstream models, materialize it as a table to avoid recomputation.
profiles.yml and per-model parallelism control example
# profiles.yml (example)
my_project:
target: dev
outputs:
dev:
type: snowflake
account: ...
user: ...
password: ...
role: ...
warehouse: XSMALL
database: ...
schema: ...
threads: 8
-- Split heavy models into a separate job via tags
# In dbt Cloud: Step1: tag:heavy, Step2: tag:not heavy, etc.
# Local example
dbt run --select tag:heavy && dbt run --select tag:lightUse DWH features cautiously and measure as you go. On Snowflake, lean on the automatic micro-partition clustering as a baseline and only set explicit clustering keys when truly needed. Balancing result cache and warehouse size also matters. On Databricks, run Delta Lake OPTIMIZE or Z-ORDER via post-hooks to reduce scan volume.
In every case, the benefit depends on data distribution and query patterns. Do not apply these mechanically to every table — pick the columns deliberately and decide based on measurement.
DWH optimization via post-hooks (conditional)
{{
config(
materialized='table',
tags=['large','read_heavy'],
post_hook=(
[
"OPTIMIZE {{ this }} ZORDER BY (order_date)",
] if target.adapter == 'databricks' else []
)
)
}}
select * from {{ ref('fct_sales') }}Optimize the count and weight of test runs without compromising on quality. Push heavy data tests to overnight runs and keep the dev loop focused on schema tests. Run tests only on models tied to state:modified, and split full regression into a daily job.
Snapshot and seed full reloads are also expensive. Keep their frequency minimal and use --full-refresh only when truly required.
Test selection examples and accident prevention
# Test only the delta (generic tests only)
dbt test --select state:modified+ test_type:generic --defer --state target
# Restrict full-refresh to a dedicated job
dbt build --full-refresh --select tag:allow_full_refreshAnalytics Engineer
問題 1
You update the huge fact table fct_events daily. Run time is long and downstream reports are waiting. Upstream adds new rows each day, and some existing rows may also be corrected. Which configuration most realistically reduces run time?
正解: A
Because there are both inserts and partial updates, incremental + unique_key + MERGE strategy fits best. Building only the modified blast radius with state:modified+ minimizes total work. The other options either recompute far too much or have limited impact at this scale.
Should I pick incremental or table?
Decide based on update frequency and size. For very large tables with a clear daily delta, use incremental. For mid-sized tables that are read frequently and where the cost of a full rebuild is acceptable — or where update logic is too complex for clean incremental handling — choose table.
Increasing threads does not make it faster. What should I do?
The critical path of the DAG is likely the bottleneck. Prioritize optimizing heavy models on the long chain, group I/O-contending models with tags, and split their execution order. Also re-check the DWH concurrency limits and cluster size.
How do I operate --full-refresh safely?
Limit it to a dedicated job and use tags to make the targets explicit. Run foundational tables with many dependencies during a maintenance window. To prevent accidents, drop --full-refresh from regular jobs and use a manual approval flow only when truly needed.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...