dbt

dbt insert_overwrite: Partition-Scoped Refresh for Incremental Models

2026-04-19
NicheeLab Editorial Team

Rebuilding an entire large table for every daily update is impractical. dbt's insert_overwrite strategy replaces only the partitions that actually changed and leaves the rest untouched, cutting both the cost and the risk of each refresh.

That said, behavior and prerequisites vary by adapter. For the exam, know which adapters support it and what they require; in practice, explicitly control which partitions get rewritten.

The Core Idea Behind insert_overwrite

Within dbt's incremental materialization, insert_overwrite is an update strategy that replaces only the specified partitions with the new result. It fits cases where upstream data for the current day (or a small window) gets replaced, and it is faster and cheaper than rebuilding the whole table.

The key point: the set of partitions to replace is determined explicitly by what the SELECT returns. Only partitions present in the model's query output are overwritten, so you narrow the range or keys inside the is_incremental() block.

  • Benefits: only the affected partitions are updated, the rest stay untouched. Job time, cost, and lock impact are all reduced.
  • Requirements: the adapter must support INSERT OVERWRITE-equivalent syntax/behavior, and the table must be partitioned.
  • Design tip: encode the logic that picks the partitions to refresh inside the query, and ensure idempotency (re-running yields the same result).

Minimal dbt model configuration (conceptual)

{{ config(
  materialized='incremental',
  incremental_strategy='insert_overwrite',
  partition_by={'field': 'ds', 'data_type': 'date'}
) }}

with src as (
  select * from {{ ref('stg_events') }}
)

select
  cast(event_time as date) as ds,
  user_id,
  count(*) as events
from src
{% if is_incremental() %}
where cast(event_time as date) >= current_date()  -- e.g. current-day partition only
{% endif %}
group by 1,2

Supported Adapters and Prerequisites

insert_overwrite is an adapter-specific strategy. Plan to use it only on adapters listed in the official documentation. At minimum, BigQuery and Spark/Databricks support INSERT OVERWRITE behavior on partitioned tables (see dbt Docs for details).

On Databricks/Spark, enable the session setting that controls partition overwrite granularity (dynamic partition overwrite). On BigQuery, the partitions returned by the query are the ones overwritten, atomically per DML statement.

  • Always confirm which adapters support it (see dbt Docs for the per-adapter incremental strategy matrix).
  • Explicitly partition the table (configure partition_by).
  • Implement a WHERE clause (or subquery) that scopes the overwrite set precisely — no more, no less.
Adapterinsert_overwrite support in dbtRequired prerequisites (partitioning, etc.)Atomicity guarantee
BigQuerySupportedpartition_by required (typically date/datetime/timestamp-based)Atomic per statement
Databricks/Spark (Delta recommended)Supportedpartition_by required. Dynamic partition overwrite recommended.Atomic via Delta ACID
SnowflakeNot supported (as a dedicated dbt strategy)Use MERGE or delete+insert insteadMERGE is atomic within a transaction
RedshiftNot supportedUse MERGE / CTE-based upsert as an alternativeDepends on transaction control

Enabling dynamic partition overwrite on Databricks/Spark

{{ config(
  pre_hook=[
    "SET spark.sql.sources.partitionOverwriteMode=dynamic"
  ]
) }}

Designing Partition-Scoped Refreshes

Deciding what to refresh is the heart of the design. In most cases you pick partitions based on the arrival range of upstream loads (e.g. the current day, the last N days, or the dates tied to detected change keys).

Only the partitions of the rows your SELECT returns are replaced, so scope the period inside the is_incremental() block and include deduplication or aggregation logic so a re-run yields the same result.

  • Ways to identify target partitions: time range bounds (min/max), change-key lookups, arrival flags.
  • Idempotency: bake deduplication and latest-record selection (e.g. window functions) into the query itself.
  • Safety valve: cap the maximum number of overwrite days as a constant so the period never grows beyond expectations.

Flow of overwritten partitions (conceptual diagram)

Raw/StagingNewly arrived dataChanged partitionsds in {2026-04-17}dbt model (insert_overwrite)is_incremental() narrows ds / aggregation and deduplicationExisting partition replacedds=2026-04-17 overwrittenNew partitionCreated if neededis_incremental() narrows ds, aggregation/deduplication produces the definitive result, and the matching partitions are overwritten

Query snippet that narrows target partitions via change detection (conceptual)

with src as (
  select * from {{ ref('stg_orders') }}
),
changed as (
  -- Narrow target partitions based on upstream arrival
  select distinct cast(updated_at as date) as ds
  from src
  {% if is_incremental() %}
  where _ingested_at >= {{ var('low_watermark', "timestamp_sub(current_timestamp(), interval 1 day)") }}
  {% endif %}
)
select /* Returning only target ds values makes those ds the overwrite targets */
  cast(order_date as date) as ds,
  customer_id,
  sum(amount) as revenue
from src
where cast(order_date as date) in (select ds from changed)
group by 1,2

Implementation Pattern (BigQuery)

On BigQuery, the combination of the INSERT OVERWRITE DML and a partitioned table atomically replaces only the partitions returned by the query. On the dbt side, set partition_by and narrow the target period inside is_incremental().

Clustering is optional, but pairing it with partition_by is a safe choice for scan efficiency. For long backfills, split the period into chunks and run multiple times.

  • partition_by is required. A date/timestamp column is the standard choice.
  • Restricting the ds values returned inside the is_incremental() block is the sole determinant of the overwrite scope.
  • Run backfills in small slices (e.g. 7 days at a time).

dbt model example (BigQuery, overwriting daily partitions)

{{ config(
  materialized='incremental',
  incremental_strategy='insert_overwrite',
  partition_by={'field': 'ds', 'data_type': 'date'},
  cluster_by=['user_id']
) }}

with src as (
  select * from {{ ref('stg_events') }}
),
changed as (
  -- Narrow target ds based on upstream arrival
  select distinct cast(event_time as date) as ds
  from src
  {% if is_incremental() %}
  where _ingested_at >= {{ var('low_watermark', "timestamp_sub(current_timestamp(), interval 1 day)") }}
  {% endif %}
)
select
  cast(event_time as date) as ds,
  user_id,
  count(*) as events
from src
where cast(event_time as date) in (select ds from changed)
group by 1,2

Implementation Pattern (Databricks/Delta)

On Databricks (Spark), Delta Lake's ACID guarantees make INSERT OVERWRITE atomic. With dynamic partition overwrite enabled, only the partitions you write are replaced while the others stay untouched.

The safe approach is to set spark.sql.sources.partitionOverwriteMode=dynamic at cluster or job scope. In dbt, applying it via a pre_hook session setting is also common.

  • Use Delta as the file format (ACID and DML features are stable).
  • Enable dynamic partition overwrite (via cluster config or a dbt pre_hook).
  • Tightly bound the target ds and produce a definitive result via aggregation or deduplication.

dbt model example (Databricks/Delta, dynamic partition overwrite)

{{ config(
  materialized='incremental',
  incremental_strategy='insert_overwrite',
  partition_by=['ds'],
  file_format='delta',
  pre_hook=[
    "SET spark.sql.sources.partitionOverwriteMode=dynamic"
  ]
) }}

with src as (
  select * from {{ ref('stg_events') }}
),
changed as (
  -- Only ds values that changed within the last day
  select distinct cast(event_time as date) as ds
  from src
  {% if is_incremental() %}
  where load_date >= date_add(current_date(), -1)
  {% endif %}
)
select
  cast(event_time as date) as ds,
  user_id,
  count(*) as events
from src
where cast(event_time as date) in (select ds from changed)
group by 1,2

Operational Know-How and Exam Tips

Operationally, set up guardrails for the maximum period and pre-checks that fail the run if the limit is exceeded, so a too-wide overwrite never blows up cost or lock contention. For monitoring, log the count and range of refreshed partitions, job duration, and skip detections.

For the exam, lock in these points: 1) insert_overwrite is adapter-specific and partition_by is required; 2) only the partitions returned by SELECT are overwritten; 3) on Databricks the dynamic partition overwrite setting matters; 4) on Snowflake and similar adapters, MERGE is the standard approach.

  • Guardrail: fail the run if the number of target ds values exceeds, say, 10.
  • Run backfills in chunks across multiple invocations, making each run's scope explicit.
  • Add audit columns (creation timestamp, source snapshot ID) to make troubleshooting easier.

Backfilling with variables (coordinating the CLI and the model)

-- Example run (7-day backfill)
-- dbt run --select fact_events --vars "start_date: 2026-04-10, end_date: 2026-04-16"

-- is_incremental() filter on the model side (adapter-agnostic conceptual example)
{% if is_incremental() %}
where ds between {{ var('start_date', 'current_date()') }} and {{ var('end_date', 'current_date()') }}
{% endif %}

Check Your Understanding

Analytics Engineer

問題 1

On Databricks (Delta), you want to overwrite only the daily partition ds. Which dbt model configuration atomically replaces just the affected ds while leaving every other partition untouched?

  1. Set materialized=incremental, incremental_strategy=insert_overwrite, partition_by=['ds'], and enable spark.sql.sources.partitionOverwriteMode=dynamic via a pre_hook
  2. Set materialized=incremental with incremental_strategy=append, and restrict ds in the WHERE clause
  3. Set materialized=incremental with only incremental_strategy=merge and no partition_by
  4. Run CREATE OR REPLACE TABLE AS SELECT to rebuild the whole table every time

正解: A

On Databricks/Delta, safe partition-scoped replacement requires combining dbt's insert_overwrite strategy with a partitioned table. Enable dynamic partition overwrite so only the returned partitions are replaced. append leaves the old rows in place (not an overwrite), and merge introduces unnecessary row-level matching when whole-partition replacement is the goal.

Frequently Asked Questions

Can I use insert_overwrite on Snowflake too?

dbt's insert_overwrite strategy is primarily designed for BigQuery and Spark/Databricks. On Snowflake you typically meet the same requirement with MERGE (or delete+insert). Check the official dbt docs for adapter support.

How can I keep the set of overwritten partitions from growing too wide?

Tightly bound the set inside the is_incremental() block using a date range or change key, and add a pre_hook check that enforces an upper limit on the number of partitions. If the scope is wider than expected, fail the run so an operator can review it.

Is it safe against sources with duplicates or out-of-order events?

Yes, as long as you include logic that produces a definitive result within each overwritten partition (deduplication, latest-wins window functions, aggregation). insert_overwrite itself is just the replacement mechanism; correctness depends entirely on what your SELECT returns.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.