dbt's incremental models are the core feature for avoiding full recomputation and reflecting only the changes. But picking the wrong incremental_strategy invites the classic failure modes: latency, duplication, missing rows, and unapplied deletes.
Grounded in the stable behavior described in the official documentation, this article walks through the differences between merge / append / delete+insert / insert_overwrite, the points the exam likes to test, and the configuration and validation steps you need in production.
After the initial full load, dbt incremental models process only the deltas on subsequent runs. You identify the delta by combining is_incremental() with boundary conditions on time, ID, or partition. The strategy you choose changes how duplicates are deduped, how deletes are reflected, and what keys or partitions are required.
The Analytics Engineer exam loves to ask which strategy fits which use case, how unique_key / partition_by / on_schema_change interact, and how safe a rerun is. You especially need to judge correctly: how to handle deletes (impossible with append), what to pick when there is no key, and when partition-level replacement via insert_overwrite is the right answer.
High-level incremental update flow
Minimal incremental template (boundary conditions are made concrete later)
{{ config(
materialized='incremental',
incremental_strategy='merge',
unique_key='id'
) }}
with src as (
select * from {{ source('app', 'events') }}
)
select *
from src
{% if is_incremental() %}
-- 差分抽出条件(例:イベント日時や増分IDなど)
where event_ts > (select coalesce(max(event_ts), '1970-01-01') from {{ this }})
{% endif %}incremental_strategy determines the physical update operation. merge does an upsert, append simply appends, delete+insert removes the target keys then inserts, and insert_overwrite replaces entire partitions. Your choice depends on the data characteristics (update frequency, latency, presence of deletes, key availability) and the capabilities of your adapter.
The comparison table below covers the angles that matter for both the exam and production: change-detection method, requirements, strengths, caveats, and typical adapter support.
| Strategy | Change reflection | Requirements (key / partition) | Strengths |
|---|---|---|---|
| merge | UPSERT (handles update, insert, and delete-equivalent semantics) | unique_key effectively required | Most versatile; strong for late-arriving data and updates |
| append | Append only (does not reflect deletes or updates) | No key required | Fastest and simplest; fits log-style data |
| delete+insert | Delete the target keys then reinsert (effectively UPSERT) | unique_key required | Reproduces updates even on engines without native MERGE |
| insert_overwrite | Replaces specified partitions (block-level, not delta-level) | partition_by required (with appropriate granularity) | Fast and stable at scale via partition-level operations |
Minimal sample configuration for each strategy
-- merge(汎用 upsert)
{{ config(materialized='incremental', incremental_strategy='merge', unique_key='id') }}
-- append(追記のみ)
{{ config(materialized='incremental', incremental_strategy='append') }}
-- delete+insert(キー削除→再挿入)
{{ config(materialized='incremental', incremental_strategy='delete+insert', unique_key='id') }}
-- insert_overwrite(パーティション置換)
{{ config(
materialized='incremental',
incremental_strategy='insert_overwrite',
partition_by={'field': 'event_date', 'data_type': 'date'}
) }}unique_key identifies a row uniquely and is effectively required for merge and delete+insert. Composite keys can be specified as an array. When the natural key is unstable, introduce a stable surrogate key (hash) to absorb upstream churn.
When you use insert_overwrite, the partition_by granularity drives execution cost. Choose an appropriate grain (daily, hourly, etc.) by considering the maximum late-arrival window and downstream SLAs.
on_schema_change controls behavior for column additions and type changes. The availability of append_new_columns and sync_all_columns depends on the adapter, so confirm support up front and design conservatively.
Representative configuration (composite key + schema-change handling)
{{ config(
materialized='incremental',
incremental_strategy='merge',
unique_key=['order_id','line_no'],
on_schema_change='sync_all_columns' -- アダプタ対応状況に注意
) }}
with src as (
select *,
to_date(created_at) as order_date
from {{ ref('stg_orders') }}
)
select * from src
{% if is_incremental() %}
where updated_at > (select coalesce(max(updated_at), '1970-01-01') from {{ this }})
{% endif %}The standard practice is to widen the is_incremental() condition slightly to account for the late-arrival window. For example, if up to 2 days of late arrivals are expected, shift the target's maximum timestamp back by 2 days, reingest that range, and reconcile via merge.
When defending against late data with the append strategy, always provide upstream dedupe or window-reload correction logic. delete+insert is robust to reruns provided you can correctly extract the target key set, but the cost balloons when that set is large.
Boundary condition robust to late arrivals (2-day shift example)
{{ config(materialized='incremental', incremental_strategy='merge', unique_key='event_id') }}
with src as (
select * from {{ source('app', 'events') }}
), tgt as (
select date_add('day', -2, max(event_ts)) as boundary from {{ this }}
)
select s.*
from src s
left join tgt t on 1=1
{% if is_incremental() %}
where s.event_ts >= coalesce(t.boundary, '1970-01-01')
{% endif %}Snowflake: merge is the stable choice. In auto-clustered micro-partition environments, datetime filter conditions cut I/O. An insert_overwrite equivalent is uncommon as a strategy here; consider daily table partitioning or clone separately.
BigQuery: both merge and insert_overwrite are common. insert_overwrite assumes a partitioned table, rewrites only the specified partition range, and remains efficient at scale. Configuring clustering columns appropriately also helps reduce scan volume.
Databricks/Delta (dbt-spark / dbt-databricks): merge is the leading choice. insert_overwrite assumes dynamic partition overwrite mode is enabled. For heavy upsert workloads, also consider optimizations such as Z-Order alongside VACUUM operations.
Example of insert_overwrite for BigQuery (date partitioned)
{{ config(
materialized='incremental',
incremental_strategy='insert_overwrite',
partition_by={'field': 'event_date', 'data_type': 'date'},
cluster_by=['user_id']
) }}
with src as (
select *, cast(event_ts as date) as event_date from {{ ref('stg_events') }}
), window as (
-- 直近3日だけを置換する例
select generate_date_array(date_sub(current_date(), interval 2 day), current_date()) as dates
)
select s.*
from src s
join unnest((select dates from window)) d on d = s.event_date
{% if is_incremental() %}
-- 置換対象のパーティションのみ抽出
where s.event_date in (select d from unnest((select dates from window)) as d)
{% endif %}Below is a merge implementation for order lines. It handles both updates and deletes, is robust to late arrivals, and serves as a model answer for the exam. A validation checklist is included at the end.
In validation, you stabilize the model by including automated tests for: the initial full load, reruns, reingestion of late-arriving data, the expected result on key duplication, and behavior during schema changes.
Order lines merge implementation (handles deletes, updates, and late arrivals)
{{ config(
materialized='incremental',
incremental_strategy='merge',
unique_key=['order_id','line_no'],
on_schema_change='append_new_columns'
) }}
with src as (
select *,
to_timestamp(updated_at) as up_ts
from {{ ref('stg_order_lines') }}
), boundary as (
select date_add('day', -1, max(up_ts)) as b from {{ this }}
)
select s.*
from src s
left join boundary b on 1=1
{% if is_incremental() %}
where s.up_ts >= coalesce(b.b, '1970-01-01')
{% endif %}
-- 削除フラグを採用する場合はモデル内でフィルタせず、
-- merge 条件と WHEN MATCHED THEN UPDATE/DELETE のテンプレートは
-- アダプタの既定に従います(dbt が生成)。Analytics Engineer
問題 1
You want to replace only the last 3 days of a massive event table partitioned daily by event_date, while also correctly reflecting duplicates and deletes. Which dbt configuration is most appropriate?
正解: A
insert_overwrite is ideal for partition-level replacement: specify partition_by and extract only the partitions to be replaced. append cannot reflect deletes or updates and is unsuitable; a full rebuild every run is unnecessarily expensive; and merge is not safe without a unique_key.
Which strategy should I pick when there is no unique_key?
If you do not need to reflect updates or deletes, use append. If you need updates and can design partitions appropriately, consider insert_overwrite. merge and delete+insert generally require a unique_key.
Is it always safe to use sync_all_columns for on_schema_change?
Not necessarily — support and behavior vary by adapter. Start safely with ignore or append_new_columns, validate the behavior in a non-production environment, and only then adopt sync_all_columns.
What is the standard approach for handling late-arriving data while keeping costs low?
The standard pattern is to shift the boundary condition back by the late-arrival window, reload that range, and reconcile via merge. If you can partition the table, replacing only the affected partitions with insert_overwrite is even more efficient.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...