Incremental Materialization: Smart Updates (2026)

dbt's incremental materialization is a mechanism that reduces processing time and cost by writing only new or updated records without reprocessing existing data. The concept is simple, but failure to design unique keys, change detection logic, strategy selection (merge/append/insert_overwrite), and schema change handling consistently can lead to incidents.

Based on stable features in the official dbt documentation, this article covers the basics of incrementals, implementation differences across major warehouses (Snowflake/BigQuery/Databricks/Spark/Redshift), and exam-prone points all at once.

Incremental Materialization Principles and Prerequisites

Incremental materializes a model as a table and reflects only new or updated data to the target table in subsequent runs. The default behavior is a full build on the first run and incremental processing thereafter.

Change detection is performed by conditions within the is_incremental() block in the model SQL or by adapter-provided strategies (merge or insert_overwrite). Defining unique keys and change detection columns is key to safe operation.

First run: always full load. Table is created per the specified strategy
Subsequent runs: only the delta is applied per is_incremental() conditions and strategy
To rebuild: explicitly use dbt run --full-refresh

Materialization	Storage usage	Rebuild time estimate
view	Zero (computed at query time)	Fast (always query-evaluated)
table	Large	Requires full recomputation; heavy
incremental	Medium	Short, delta only
ephemeral	Zero	Inlined upstream (depends on downstream load)

Conceptual flow of incremental builds

Minimal example: incremental model template

{{ config(
    materialized='incremental'
) }}

with src as (
  select * from {{ ref('stg_orders') }}
)

select * from src
{% if is_incremental() %}
  -- 2回目以降にだけ適用される差分条件をここに記述
  -- 例: updated_atがターゲットの最大値以降
  where updated_at >= (
    select coalesce(max(updated_at), '1970-01-01') from {{ this }}
  )
{% endif %}

Implementation Patterns: Designing is_incremental and unique_key

There are two common approaches to change detection. One uses a timestamp watermark (max updated_at) to filter; the other uses a unique key to merge (UPSERT) conflicting rows. The former is simple and fast; the latter is robust against duplicates and late-arriving data.

By setting unique_key and choosing incremental_strategy='merge' in dbt, supported adapters auto-generate a MERGE-equivalent UPSERT. Combining the where condition of is_incremental() with the MERGE optimizes the balance between read volume and consistency.

Watermark approach: reference the max value of updated_at or ingested_at from the target
Key approach: specify unique_key and UPSERT via MERGE
Hybrid: read only recent data via where, finalize consistency via MERGE

Approach	Pros	Caveats
Watermark (timestamp)	Simple, fast, reduced scan	Risk of missing data from clock drift or late arrivals
Unique key + MERGE	Robust against duplicates and update overwrites	Cannot work with undefined or unstable keys
Hybrid	Balances I/O reduction with consistency	full-refresh is the safety net if where misses data

is_incremental branching diagram

UPSERT template using MERGE

{{ config(
    materialized='incremental',
    incremental_strategy='merge',
    unique_key='order_id'
) }}

with src as (
  select * from {{ ref('stg_orders') }}
  {% if is_incremental() %}
    where updated_at >= (
      select coalesce(max(updated_at), '1970-01-01') from {{ this }}
    )
  {% endif %}
)

select * from src

Strategy-Specific Behavior: merge / append / insert_overwrite

Choose incremental_strategy based on the capabilities of your adapter (warehouse). Support varies by version, so check your environment's adapter documentation. The following are stable, well-known trends.

merge requires a unique key and handles updates and inserts simultaneously. append is always append-only, and deduplication must be handled by upstream SQL. insert_overwrite can replace data per partition and is commonly used with BigQuery and Spark.

Snowflake/Redshift: primarily merge and append
BigQuery: in addition to merge, insert_overwrite can be combined with partitioning
Databricks/Spark: merge is mainstream; insert_overwrite (partitions) has also become an option recently

Strategy	Main supported engines	Requirements/Caveats
merge	Snowflake, BigQuery, Databricks/Spark, Redshift	unique_key required. Handles updates and inserts together
append	All	You must deduplicate yourself. No updates (no overwrite)
insert_overwrite	BigQuery, Databricks/Spark	partition_by is a prerequisite. Replaces the target partition

Rough flow for choosing a strategy

Configuration examples per strategy

-- Snowflake: MERGE
{{ config(materialized='incremental', incremental_strategy='merge', unique_key='id') }}

-- BigQuery: INSERT_OVERWRITE(日次分のみ置換)
{{ config(
  materialized='incremental',
  incremental_strategy='insert_overwrite',
  partition_by={'field': 'ds', 'data_type': 'date'},
  cluster_by=['customer_id']
) }}

-- Spark/Databricks: MERGE
{{ config(materialized='incremental', incremental_strategy='merge', unique_key='id') }}

Schema Changes and Re-runs: on_schema_change and full-refresh

In production, column additions and type changes can occur. dbt lets you control the default behavior via on_schema_change. Support depends on the adapter, so defaulting to the safer option is the safe bet.

For breaking changes (type narrowing, column drops, etc.) or major logic changes, rebuilding with dbt run --full-refresh is the reliable choice.

ignore: Ignore schema diffs (columns are not added)
append_new_columns: Add compatible new columns (effective on supported adapters)
sync_all_columns: Sync existing column renames and more (watch for support)
fail: Fail on any diff so you notice

Setting	Behavior summary	Use case
ignore	Do nothing	When you want to be conservative and prioritize data quality
append_new_columns	Add new columns	When column additions are frequent and backward-compatible
sync_all_columns	Sync diffs	When you want to automate schema tracking (requires adapter support)
fail	Fail on diff	When you want CI to detect and force a fix

Branching during schema evolution

Examples of on_schema_change and full-refresh

{{ config(
  materialized='incremental',
  incremental_strategy='merge',
  unique_key='id',
  on_schema_change='append_new_columns'
) }}

-- 破壊的変更時はコマンドで再作成
-- dbt run --select my_model --full-refresh

Performance Design: Partitioning, Clustering, Watermark Management

The main goal of incrementals is I/O reduction. Combined with the warehouse's partitioning/clustering features, you can further narrow the read range. insert_overwrite in particular should be considered together with partition design.

A common practice is to read with a watermark that has a slight buffer to account for late arrivals. Tune to your environment, for example by rolling back a few hours to a day from max(updated_at).

Reduce scan with partition_by/cluster_by
Safety margin: roll back slightly from max(updated_at)
Hybrid (delta where + MERGE) balances I/O and consistency

Engine	Recommended companion settings	Notes
BigQuery	partition_by(ds), cluster_by(user_id)	Pairs well with insert_overwrite
Snowflake	Clustering keys (as needed)	Configure based on cost tradeoffs
Databricks/Spark	partitionBy, Z-Ordering (environment-dependent)	Narrow the target range of MERGE

Watermark management diagram

Time ----|----|----|---->
          ^max_in_target
       <---- safety window ----
読み取り範囲 = [max_in_target - safety, now]

Example of a watermark with safety margin (using a macro)

{% macro watermark(model, col, safety_hours=6) %}
select datetime_sub(max({{ col }}), interval {{ safety_hours }} hour) as wm
from {{ model }}
{% endmacro %}

-- モデル内
with w as (
  {{ watermark(this, 'updated_at', 6) }}
), src as (
  select * from {{ ref('stg_orders') }} s
  {% if is_incremental() %}
  where s.updated_at >= (select coalesce(wm, '1970-01-01') from w)
  {% endif %}
)
select * from src

Exam Prep Checklist and Pitfalls

The Analytics Engineer exam often targets incremental basics, the necessity of unique_key, on_schema_change, full-refresh, and the choice of strategy. Written questions ask which strategy to choose and what settings are missing.

In practice, unintended duplicates, missing data, and silent schema evolution are typical failure modes. It is safest to combine CI verification (including full-refresh) with data profile monitoring.

Are you choosing merge without a unique_key?
Did you forget partition_by with insert_overwrite?
Are diffs being silently dropped by the default on_schema_change?
Are you forcing through changes that need full-refresh as incremental?

Keyword	Common wrong answer / pitfall	Correct guideline
merge	Assuming unique_key is not required	merge requires a stable unique_key
insert_overwrite	Misunderstanding that it always works on any table	Partition columns are a prerequisite. Replaces only the target partition
on_schema_change	Assuming sync_all_columns is always safe	Verify adapter support. Default to the safer side

Mini decision tree for strategy selection

Example verification and operation commands

# 差分のみ
 dbt run --select tag:incremental
# フル再作成(影響小さい単位で)
 dbt run --select my_model --full-refresh
# テストと組み合わせ
 dbt test --select state:modified+

Check Your Understanding

Analytics Engineer

問題 1

You want to keep past data immutable and replace only today's data in a BigQuery fact table with daily partitions (ds column, DATE). Which recommended setting minimizes processing time while avoiding duplicates and missing data?

materialized='incremental', incremental_strategy='insert_overwrite', partition_by={'field': 'ds', 'data_type': 'date'}
materialized='incremental', incremental_strategy='append' only
materialized='incremental', incremental_strategy='merge', unique_key not set
materialized='table' with full rebuild every time

正解: A

For the requirement of replacing only today's data in daily partitions, BigQuery's insert_overwrite fits. A partition_by definition is required. append is weak against duplicates and same-day updates, merge without unique_key is inappropriate, and table full rebuild is unnecessarily heavy.

Frequently Asked Questions

Is unique_key always required?

No. It is required for the merge strategy, but not for designs that only need append or insert_overwrite (immutable/append-only data or partition replacement). However, if you need to overwrite updates, you must define a stable unique_key.

How can I avoid missing late-arriving data?

Pull the watermark back a few hours to a day from max(updated_at), or use a hybrid approach that reads only recent data with a where clause and reconciles via merge. If misses are suspected, temporarily widen the range or run a full-refresh to restore consistency.

What is the safest default for on_schema_change?

It depends on your operations policy, but starting conservatively with fail or ignore and explicitly switching to append_new_columns when adding columns is the safe bet. sync_all_columns is convenient but should only be enabled after verifying adapter support and the nature of the changes.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Introduction to dbt Incremental Materialization: Basics and Exam Prep