dbt

Choosing the Right dbt Incremental Strategy: merge / append / delete+insert / insert_overwrite

2026-04-19
NicheeLab Editorial Team

dbt's incremental models are the core feature for avoiding full recomputation and reflecting only the changes. But picking the wrong incremental_strategy invites the classic failure modes: latency, duplication, missing rows, and unapplied deletes.

Grounded in the stable behavior described in the official documentation, this article walks through the differences between merge / append / delete+insert / insert_overwrite, the points the exam likes to test, and the configuration and validation steps you need in production.

Incremental Fundamentals and What the Exam Tests

After the initial full load, dbt incremental models process only the deltas on subsequent runs. You identify the delta by combining is_incremental() with boundary conditions on time, ID, or partition. The strategy you choose changes how duplicates are deduped, how deletes are reflected, and what keys or partitions are required.

The Analytics Engineer exam loves to ask which strategy fits which use case, how unique_key / partition_by / on_schema_change interact, and how safe a rerun is. You especially need to judge correctly: how to handle deletes (impossible with append), what to pick when there is no key, and when partition-level replacement via insert_overwrite is the right answer.

  • is_incremental() is false on the first run and true on every run after
  • Whether unique_key is required depends on the strategy (merge and delete+insert generally require it)
  • Partition design directly determines whether insert_overwrite succeeds and at what cost
  • on_schema_change behaves differently per adapter, so design conservatively

High-level incremental update flow

Raw/SrcS3, GCSStagingdbt refIncremental ModelFactApply a strategy (merge / append / delete+insert / insert_overwrite) via is_incremental()

Minimal incremental template (boundary conditions are made concrete later)

{{ config(
    materialized='incremental',
    incremental_strategy='merge',
    unique_key='id'
) }}

with src as (
  select * from {{ source('app', 'events') }}
)

select *
from src
{% if is_incremental() %}
-- 差分抽出条件(例:イベント日時や増分IDなど)
where event_ts > (select coalesce(max(event_ts), '1970-01-01') from {{ this }})
{% endif %}

Comparing the 4 incremental_strategy Options and How to Pick One

incremental_strategy determines the physical update operation. merge does an upsert, append simply appends, delete+insert removes the target keys then inserts, and insert_overwrite replaces entire partitions. Your choice depends on the data characteristics (update frequency, latency, presence of deletes, key availability) and the capabilities of your adapter.

The comparison table below covers the angles that matter for both the exam and production: change-detection method, requirements, strengths, caveats, and typical adapter support.

  • If you need to reflect deletes, pick merge or delete+insert
  • If there is no key, use append (with separate dedupe) or partition replacement via insert_overwrite
  • When the partition design is good, insert_overwrite is extremely fast
  • Confirm on_schema_change compatibility ahead of any schema change
StrategyChange reflectionRequirements (key / partition)Strengths
mergeUPSERT (handles update, insert, and delete-equivalent semantics)unique_key effectively requiredMost versatile; strong for late-arriving data and updates
appendAppend only (does not reflect deletes or updates)No key requiredFastest and simplest; fits log-style data
delete+insertDelete the target keys then reinsert (effectively UPSERT)unique_key requiredReproduces updates even on engines without native MERGE
insert_overwriteReplaces specified partitions (block-level, not delta-level)partition_by required (with appropriate granularity)Fast and stable at scale via partition-level operations

Minimal sample configuration for each strategy

-- merge(汎用 upsert)
{{ config(materialized='incremental', incremental_strategy='merge', unique_key='id') }}

-- append(追記のみ)
{{ config(materialized='incremental', incremental_strategy='append') }}

-- delete+insert(キー削除→再挿入)
{{ config(materialized='incremental', incremental_strategy='delete+insert', unique_key='id') }}

-- insert_overwrite(パーティション置換)
{{ config(
  materialized='incremental',
  incremental_strategy='insert_overwrite',
  partition_by={'field': 'event_date', 'data_type': 'date'}
) }}

Key Points on unique_key, partition_by, and on_schema_change

unique_key identifies a row uniquely and is effectively required for merge and delete+insert. Composite keys can be specified as an array. When the natural key is unstable, introduce a stable surrogate key (hash) to absorb upstream churn.

When you use insert_overwrite, the partition_by granularity drives execution cost. Choose an appropriate grain (daily, hourly, etc.) by considering the maximum late-arrival window and downstream SLAs.

on_schema_change controls behavior for column additions and type changes. The availability of append_new_columns and sync_all_columns depends on the adapter, so confirm support up front and design conservatively.

  • unique_key supports composite specification like ['id1','id2']
  • insert_overwrite provides no real benefit without partition_by
  • on_schema_change options include ignore / append_new_columns / sync_all_columns (adapter-dependent)
  • Validate schema changes before production with dbt build and a rehearsal run

Representative configuration (composite key + schema-change handling)

{{ config(
  materialized='incremental',
  incremental_strategy='merge',
  unique_key=['order_id','line_no'],
  on_schema_change='sync_all_columns'  -- アダプタ対応状況に注意
) }}

with src as (
  select *,
         to_date(created_at) as order_date
  from {{ ref('stg_orders') }}
)

select * from src
{% if is_incremental() %}
where updated_at > (select coalesce(max(updated_at), '1970-01-01') from {{ this }})
{% endif %}

Boundary Conditions, Late Arrivals, and Safe Reruns

The standard practice is to widen the is_incremental() condition slightly to account for the late-arrival window. For example, if up to 2 days of late arrivals are expected, shift the target's maximum timestamp back by 2 days, reingest that range, and reconcile via merge.

When defending against late data with the append strategy, always provide upstream dedupe or window-reload correction logic. delete+insert is robust to reruns provided you can correctly extract the target key set, but the cost balloons when that set is large.

  • Late-window reload + merge is the safest combination
  • When choosing append, provide a separate dedupe model
  • With insert_overwrite, replace only the partitions that overlap the window
  • Insist on idempotent design (same result on rerun) to prepare for reruns

Boundary condition robust to late arrivals (2-day shift example)

{{ config(materialized='incremental', incremental_strategy='merge', unique_key='event_id') }}

with src as (
  select * from {{ source('app', 'events') }}
), tgt as (
  select date_add('day', -2, max(event_ts)) as boundary from {{ this }}
)
select s.*
from src s
left join tgt t on 1=1
{% if is_incremental() %}
where s.event_ts >= coalesce(t.boundary, '1970-01-01')
{% endif %}

Implementation Notes by Platform (Snowflake / BigQuery / Databricks)

Snowflake: merge is the stable choice. In auto-clustered micro-partition environments, datetime filter conditions cut I/O. An insert_overwrite equivalent is uncommon as a strategy here; consider daily table partitioning or clone separately.

BigQuery: both merge and insert_overwrite are common. insert_overwrite assumes a partitioned table, rewrites only the specified partition range, and remains efficient at scale. Configuring clustering columns appropriately also helps reduce scan volume.

Databricks/Delta (dbt-spark / dbt-databricks): merge is the leading choice. insert_overwrite assumes dynamic partition overwrite mode is enabled. For heavy upsert workloads, also consider optimizations such as Z-Order alongside VACUUM operations.

  • Snowflake: merge plus appropriate filters yields stable operations
  • BigQuery: partition_by is an essentially required design element; insert_overwrite is powerful
  • Databricks: Delta merge is reliable but watch for small-file fragmentation (run OPTIMIZE)

Example of insert_overwrite for BigQuery (date partitioned)

{{ config(
  materialized='incremental',
  incremental_strategy='insert_overwrite',
  partition_by={'field': 'event_date', 'data_type': 'date'},
  cluster_by=['user_id']
) }}

with src as (
  select *, cast(event_ts as date) as event_date from {{ ref('stg_events') }}
), window as (
  -- 直近3日だけを置換する例
  select generate_date_array(date_sub(current_date(), interval 2 day), current_date()) as dates
)
select s.*
from src s
join unnest((select dates from window)) d on d = s.event_date
{% if is_incremental() %}
-- 置換対象のパーティションのみ抽出
where s.event_date in (select d from unnest((select dates from window)) as d)
{% endif %}

Complete Example and Validation Checklist

Below is a merge implementation for order lines. It handles both updates and deletes, is robust to late arrivals, and serves as a model answer for the exam. A validation checklist is included at the end.

In validation, you stabilize the model by including automated tests for: the initial full load, reruns, reingestion of late-arriving data, the expected result on key duplication, and behavior during schema changes.

  • Whether record counts and hashes match between the first run and subsequent runs
  • Inject late-arriving data manually and confirm consistency on rerun
  • Confirm reflection of delete events (is_deleted, etc.) when present
  • Whether on_schema_change behaves as expected during schema changes

Order lines merge implementation (handles deletes, updates, and late arrivals)

{{ config(
  materialized='incremental',
  incremental_strategy='merge',
  unique_key=['order_id','line_no'],
  on_schema_change='append_new_columns'
) }}

with src as (
  select *,
         to_timestamp(updated_at) as up_ts
  from {{ ref('stg_order_lines') }}
), boundary as (
  select date_add('day', -1, max(up_ts)) as b from {{ this }}
)
select s.*
from src s
left join boundary b on 1=1
{% if is_incremental() %}
where s.up_ts >= coalesce(b.b, '1970-01-01')
{% endif %}

-- 削除フラグを採用する場合はモデル内でフィルタせず、
-- merge 条件と WHEN MATCHED THEN UPDATE/DELETE のテンプレートは
-- アダプタの既定に従います(dbt が生成)。

Check Your Understanding

Analytics Engineer

問題 1

You want to replace only the last 3 days of a massive event table partitioned daily by event_date, while also correctly reflecting duplicates and deletes. Which dbt configuration is most appropriate?

  1. materialized='incremental', incremental_strategy='insert_overwrite', partition_by=event_date
  2. materialized='incremental', incremental_strategy='append' のみ
  3. materialized='table' にして毎回フルリビルド
  4. materialized='incremental', incremental_strategy='merge', unique_key なし

正解: A

insert_overwrite is ideal for partition-level replacement: specify partition_by and extract only the partitions to be replaced. append cannot reflect deletes or updates and is unsuitable; a full rebuild every run is unnecessarily expensive; and merge is not safe without a unique_key.

Frequently Asked Questions

Which strategy should I pick when there is no unique_key?

If you do not need to reflect updates or deletes, use append. If you need updates and can design partitions appropriately, consider insert_overwrite. merge and delete+insert generally require a unique_key.

Is it always safe to use sync_all_columns for on_schema_change?

Not necessarily — support and behavior vary by adapter. Start safely with ignore or append_new_columns, validate the behavior in a non-production environment, and only then adopt sync_all_columns.

What is the standard approach for handling late-arriving data while keeping costs low?

The standard pattern is to shift the boundary condition back by the late-arrival window, reload that range, and reconcile via merge. If you can partition the table, replacing only the affected partitions with insert_overwrite is even more efficient.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.