dbt

dbt delete+insert Strategy: A Complete Guide to the Rebuild Approach

2026-04-19
NicheeLab Editorial Team

delete+insert is an incremental strategy that first deletes the rows matching the target keys and then inserts the new rows to keep the table consistent. It works reliably in environments where MERGE is not optimal or has limited support.

This article follows the assumptions in the official documentation (dbt docs: Incremental models and each adapter's strategies) and focuses on basic design and operational patterns that are not easily affected by version differences, providing perspectives that help both daily work and the Analytics Engineer exam.

Key Points of the delete+insert Strategy and the Rebuild Mindset

delete+insert identifies target rows by their unique key, deletes the old rows, and inserts the new ones, producing an incremental upsert that preserves eventual consistency. It is useful when a single-statement upsert like MERGE is unavailable or operationally undesirable.

As a rebuild approach, it lets you partially recreate data by explicitly stating the target key range, and it makes rollback easier — when problems occur you can just delete the affected keys and re-insert. Transaction isolation and locking behavior are adapter-dependent, so consider them together with job parallelism and physical design (clustering, partitioning).

  • Prerequisite: materialized=incremental with a unique_key specified
  • Control delta extraction with is_incremental()
  • Recovery on failure: re-delete then re-insert the same keys — highly reversible
  • Where it fits: historical Redshift implementations, environments with MERGE restrictions, pipelines that emphasize partitioned rebuilds
StrategyRequirements / DependenciesStrengths and Fit
delete+insertunique_key required. Needs DELETE/INSERT support.Easy to rerun. More I/O. Key selection is critical.
mergeAdapter features like MERGE DML / Delta MERGE.Fast and consistent upsert. Not available where the feature is unsupported.
insert_overwriteExplicit control over partitions and clustering.Partition-level replacement is fast, but the design requirements are strict.

delete+insert Processing Flow (Rebuild-Oriented)

Incremental source (new + updated rows)Temp/source CTE: new_rowsExtract the set of unique_keysDELETE FROM target_table WHERE unique_key IN (...)INSERT INTO target_table (new_rows)

Minimal dbt Incremental Model (delete+insert)

with source as (
  select * from {{ ref('stg_orders') }}
  {% if is_incremental() %}
    where updated_at >= (select coalesce(max(updated_at), '1900-01-01') from {{ this }})
  {% endif %}
),

new_rows as (
  select
    order_id as id,
    customer_id,
    status,
    updated_at
  from source
)

select * from new_rows

-- model config (at top of file, or via config block)
-- {{ config(
--   materialized='incremental',
--   incremental_strategy='delete+insert',
--   unique_key='id'
-- ) }}

Requirements and Prerequisites: unique_key and Delta Control

delete+insert cannot work without a unique_key. Duplicate keys create ambiguity about which rows to delete and become a breeding ground for mistaken deletes and duplicate inserts. The key should be logically unique (composite is fine), and in practice it should be materialized on the target side as an index or cluster key.

Delta extraction is controlled with is_incremental(). Narrow the range by updated_at or a load boundary column and produce the key set that matches the delete range. Account for late-arriving timestamps by widening the window slightly or by clarifying your rerun policy.

  • unique_key can be single or composite; NULL-free design is strongly preferred
  • Practical delta conditions include update timestamps, incremental flags, or hash-change detection
  • Combine unique and not_null dbt tests for validation
  • Design clustering / sort / indexes on the target to make key lookups fast
ItemRecommendationReason
unique_keyLogically unique, no NULLsTo correctly determine the DELETE scope
Delta columnupdated_at + a safety windowAbsorbs late arrivals and stabilizes reruns
Constraints / physical designClustering / partitioning / indexesBalances DELETE target lookup with INSERT performance

Mapping Delta Extraction to the Key Set

source rowsnew_rowsfilter by is_incremental()unique_key setderive keysMapping delta extraction to the key set

Composite Key with a Look-Back Window Example

{{ config(
  materialized='incremental',
  incremental_strategy='delete+insert',
  unique_key=['id','effective_date']
) }}

with src as (
  select * from {{ ref('stg_line_items') }}
  {% if is_incremental() %}
    where updated_at >= dateadd(day, -1, (select coalesce(max(updated_at),'1900-01-01') from {{ this }}))
  {% endif %}
)
select id, effective_date, qty, price, updated_at from src

Implementation Patterns: Splitting Model Definition and Configuration

Manage configuration with a config block at the top of the model file, or in the models section of dbt_project.yml. When you need to switch strategies between environments (prod vs. dev), branching on vars or target.name makes maintenance easier.

On some adapters MERGE becomes the default, but explicitly specifying delete+insert forces that strategy to be used. Check support status and default behavior in the adapter's official documentation.

  • Specify the strategy and keys inside the model and enforce consistency with tests
  • Switching incremental_strategy per environment makes validation easier
  • Use dbt run-operation as a supplementary tool for maintenance scripts
Configuration locationWhen to useCaveats
In-model configExpress model-specific requirements directlyToo many settings make cross-cutting governance hard
dbt_project.ymlApply patterns in bulkOverride exception models explicitly
Env vars / varsSwitch strategy per environmentAvoid letting the branching get too complex

Strategy Switching by Environment

target=devdelete+insert explicittarget=prodSame strategy in prod or MERGEStrategy switching by environment

Default Strategy and Overrides in dbt_project.yml

models:
  project_name:
    +materialized: incremental
    +incremental_strategy: delete+insert
    +unique_key: id
    marts:
      orders:
        +incremental_strategy: delete+insert  # explicit
        +unique_key: [id, effective_date]

Performance and Operational Pitfalls

delete+insert tends to increase I/O. To speed up locating delete targets, combine platform-specific optimizations on the target table: clustering (Snowflake), cluster/sort keys (Redshift), clustering/partitioning (BigQuery), and Z-ORDER or clustering (Databricks Delta).

Be mindful of the locking impact of bulk DELETE and the storage growth from time travel / retention. Operational tricks such as staggering schedules to avoid concurrent runs over the same key range, or sharding the key range per job, go a long way.

  • Use key-based clustering and partitioning to shrink the DELETE scan range
  • Make inserts efficient with batched COPY/INSERT (adapter-dependent)
  • When parallelizing jobs, avoid contention on the same keys (mind pessimistic vs. optimistic locking)
  • Estimate the cost impact of retention and auto-clustering ahead of time
OptimizationExpected effectCaveats
Clustering / partition designReduces scan for both DELETE and INSERTOver-partitioning causes small files and overhead
Batch size tuningFewer commits and higher throughputVery long transactions have a higher failure risk
Key-range shardingBetter concurrencyRequires careful key-range management to avoid gaps

Minimize Scan with Key-Aligned Clustering

Beforerandom distributionDELETE scans wideAfterclustered by unique_keyDELETE scans narrowMinimize scan with key-aligned clustering

Clustering / Partitioning Design Examples (Conceptual)

-- Snowflake example (conceptual)
-- create or replace table T cluster by (id);
-- BigQuery example (conceptual)
-- partition by date(updated_at) cluster by id
-- Databricks Delta example (conceptual)
-- optimize T zorder by (id)

Validation and Rerun: A Reversible Rebuild Workflow

A key strength of delete+insert is that failures are easy to recover from by re-deleting and re-inserting the affected keys. Manage the timing of downstream references through model dependencies (ref) and run tests (unique, not_null, functional dedup) at the end of the pipeline to catch inconsistencies early.

A full refresh (--full-refresh) is the last resort for a complete rebuild. First try to identify the affected keys and recover with a partial rerun (--select model_name), and if needed, temporarily widen the rerun window to restore consistency.

  • Reduce failure granularity to per-key and recover with partial reruns
  • Layer validation tests across unique, not_null, and functional duplicate checks
  • In production, watch for read consistency and control cutover timing
  • Estimate cost and lock time for a full refresh up front
Failure scenarioBehaviorRecommended response
INSERT interruptedSome keys were not insertedRerun the same increment; the upfront DELETE absorbs any duplicates
DELETE interruptedOld rows remainResolved on the next run via another DELETE then INSERT
Missed deltaLost updates / missing rowsWiden the window or identify affected keys and rerun

Reversible Flow on Failure

Failidentify keysrerun same windowDELETE keysINSERT rowstests pass

Choosing Between Partial Rerun and Full Refresh

# Rerun only the affected model
# dbt run --select my_incremental_model

# Full refresh (last resort)
# dbt run --full-refresh --select my_incremental_model

Exam Prep: Points Targeted on the Analytics Engineer Exam

The exam tests your understanding of when each incremental strategy applies, the fact that unique_key is required for delete+insert, how to use is_incremental(), the meaning of --full-refresh, and conservative judgment about adapter differences.

It is also important to organize the differences between MERGE and delete+insert, along with the prerequisites of insert_overwrite (partition replacement), as a structured comparison.

  • delete+insert requires unique_key; harden it with delta extraction and tests
  • MERGE is a single-statement upsert, but its availability is environment-dependent — verify before standardizing on it
  • insert_overwrite is efficient for partition-level rebuilds
  • Know when to use a rerun vs. a full refresh on failure
ThemeKeywordsTrap points
Choosing an incremental strategyunique_key / MERGE / overwritedelete+insert is impossible without a unique_key
Delta controlis_incremental() / boundary columnDirection of inequality and handling of late arrivals
Rerun--full-refresh / partial selectionThe cost of overusing full refresh

Learning Map Across Strategies

delete+insertmergeinsert_overwriteNeeds: unique_keyNeeds: MERGE supportNeeds: partition designLearning map across strategies

Frequently Tested Configuration Snippet

{{ config(
  materialized='incremental',
  incremental_strategy='delete+insert',
  unique_key='id'
) }}
{% if is_incremental() %}
  -- don't forget the delta condition
{% endif %}

Check Your Understanding

Analytics Engineer

問題 1

Which is the most appropriate primary reason to choose the delete+insert strategy in a dbt incremental model?

  1. Because MERGE support or operational constraints push you toward safe, row-level rebuilds (delete then insert)
  2. Because you want to fully recompute the table every time to maximize aggregation accuracy
  3. Because you want partitions to be created automatically and only the latest partition replaced
  4. Because you want to upsert an event log that has no unique_key as-is

正解: A

delete+insert assumes a unique_key, deletes the matching rows, and then inserts the new ones. It provides a stable upsert even when MERGE is unavailable or undesirable, and reruns are easy. Full recomputation (B) describes full-refresh, partition replacement (C) is insert_overwrite, and skipping unique_key (D) is incompatible.

Frequently Asked Questions

Can delete+insert be used with columns where unique_key may be NULL?

Not recommended. When NULL values are present, the rows targeted for deletion cannot be uniquely identified, leading to duplicates or stale rows. Normalize with coalesce in the model or use a composite key to eliminate NULLs, and enforce this with dbt's unique / not_null tests.

What is the advantage of choosing delete+insert even in environments that support MERGE?

It fits when you value simple reruns (re-delete the affected keys then re-insert) and high observability of the logic. That said, MERGE is often more efficient in most environments, so weigh the I/O and locking impact before choosing.

When should you choose insert_overwrite vs. delete+insert?

If your partition design is clear and the replacement scope can be defined at the partition level, insert_overwrite is faster. If partitioning is hard to design or you need fine-grained, key-level rebuilds, consider delete+insert.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.