Rebuilding an entire large table for every daily update is impractical. dbt's insert_overwrite strategy replaces only the partitions that actually changed and leaves the rest untouched, cutting both the cost and the risk of each refresh.
That said, behavior and prerequisites vary by adapter. For the exam, know which adapters support it and what they require; in practice, explicitly control which partitions get rewritten.
Within dbt's incremental materialization, insert_overwrite is an update strategy that replaces only the specified partitions with the new result. It fits cases where upstream data for the current day (or a small window) gets replaced, and it is faster and cheaper than rebuilding the whole table.
The key point: the set of partitions to replace is determined explicitly by what the SELECT returns. Only partitions present in the model's query output are overwritten, so you narrow the range or keys inside the is_incremental() block.
Minimal dbt model configuration (conceptual)
{{ config(
materialized='incremental',
incremental_strategy='insert_overwrite',
partition_by={'field': 'ds', 'data_type': 'date'}
) }}
with src as (
select * from {{ ref('stg_events') }}
)
select
cast(event_time as date) as ds,
user_id,
count(*) as events
from src
{% if is_incremental() %}
where cast(event_time as date) >= current_date() -- e.g. current-day partition only
{% endif %}
group by 1,2insert_overwrite is an adapter-specific strategy. Plan to use it only on adapters listed in the official documentation. At minimum, BigQuery and Spark/Databricks support INSERT OVERWRITE behavior on partitioned tables (see dbt Docs for details).
On Databricks/Spark, enable the session setting that controls partition overwrite granularity (dynamic partition overwrite). On BigQuery, the partitions returned by the query are the ones overwritten, atomically per DML statement.
| Adapter | insert_overwrite support in dbt | Required prerequisites (partitioning, etc.) | Atomicity guarantee |
|---|---|---|---|
| BigQuery | Supported | partition_by required (typically date/datetime/timestamp-based) | Atomic per statement |
| Databricks/Spark (Delta recommended) | Supported | partition_by required. Dynamic partition overwrite recommended. | Atomic via Delta ACID |
| Snowflake | Not supported (as a dedicated dbt strategy) | Use MERGE or delete+insert instead | MERGE is atomic within a transaction |
| Redshift | Not supported | Use MERGE / CTE-based upsert as an alternative | Depends on transaction control |
Enabling dynamic partition overwrite on Databricks/Spark
{{ config(
pre_hook=[
"SET spark.sql.sources.partitionOverwriteMode=dynamic"
]
) }}Deciding what to refresh is the heart of the design. In most cases you pick partitions based on the arrival range of upstream loads (e.g. the current day, the last N days, or the dates tied to detected change keys).
Only the partitions of the rows your SELECT returns are replaced, so scope the period inside the is_incremental() block and include deduplication or aggregation logic so a re-run yields the same result.
Flow of overwritten partitions (conceptual diagram)
Query snippet that narrows target partitions via change detection (conceptual)
with src as (
select * from {{ ref('stg_orders') }}
),
changed as (
-- Narrow target partitions based on upstream arrival
select distinct cast(updated_at as date) as ds
from src
{% if is_incremental() %}
where _ingested_at >= {{ var('low_watermark', "timestamp_sub(current_timestamp(), interval 1 day)") }}
{% endif %}
)
select /* Returning only target ds values makes those ds the overwrite targets */
cast(order_date as date) as ds,
customer_id,
sum(amount) as revenue
from src
where cast(order_date as date) in (select ds from changed)
group by 1,2On BigQuery, the combination of the INSERT OVERWRITE DML and a partitioned table atomically replaces only the partitions returned by the query. On the dbt side, set partition_by and narrow the target period inside is_incremental().
Clustering is optional, but pairing it with partition_by is a safe choice for scan efficiency. For long backfills, split the period into chunks and run multiple times.
dbt model example (BigQuery, overwriting daily partitions)
{{ config(
materialized='incremental',
incremental_strategy='insert_overwrite',
partition_by={'field': 'ds', 'data_type': 'date'},
cluster_by=['user_id']
) }}
with src as (
select * from {{ ref('stg_events') }}
),
changed as (
-- Narrow target ds based on upstream arrival
select distinct cast(event_time as date) as ds
from src
{% if is_incremental() %}
where _ingested_at >= {{ var('low_watermark', "timestamp_sub(current_timestamp(), interval 1 day)") }}
{% endif %}
)
select
cast(event_time as date) as ds,
user_id,
count(*) as events
from src
where cast(event_time as date) in (select ds from changed)
group by 1,2On Databricks (Spark), Delta Lake's ACID guarantees make INSERT OVERWRITE atomic. With dynamic partition overwrite enabled, only the partitions you write are replaced while the others stay untouched.
The safe approach is to set spark.sql.sources.partitionOverwriteMode=dynamic at cluster or job scope. In dbt, applying it via a pre_hook session setting is also common.
dbt model example (Databricks/Delta, dynamic partition overwrite)
{{ config(
materialized='incremental',
incremental_strategy='insert_overwrite',
partition_by=['ds'],
file_format='delta',
pre_hook=[
"SET spark.sql.sources.partitionOverwriteMode=dynamic"
]
) }}
with src as (
select * from {{ ref('stg_events') }}
),
changed as (
-- Only ds values that changed within the last day
select distinct cast(event_time as date) as ds
from src
{% if is_incremental() %}
where load_date >= date_add(current_date(), -1)
{% endif %}
)
select
cast(event_time as date) as ds,
user_id,
count(*) as events
from src
where cast(event_time as date) in (select ds from changed)
group by 1,2Operationally, set up guardrails for the maximum period and pre-checks that fail the run if the limit is exceeded, so a too-wide overwrite never blows up cost or lock contention. For monitoring, log the count and range of refreshed partitions, job duration, and skip detections.
For the exam, lock in these points: 1) insert_overwrite is adapter-specific and partition_by is required; 2) only the partitions returned by SELECT are overwritten; 3) on Databricks the dynamic partition overwrite setting matters; 4) on Snowflake and similar adapters, MERGE is the standard approach.
Backfilling with variables (coordinating the CLI and the model)
-- Example run (7-day backfill)
-- dbt run --select fact_events --vars "start_date: 2026-04-10, end_date: 2026-04-16"
-- is_incremental() filter on the model side (adapter-agnostic conceptual example)
{% if is_incremental() %}
where ds between {{ var('start_date', 'current_date()') }} and {{ var('end_date', 'current_date()') }}
{% endif %}Analytics Engineer
問題 1
On Databricks (Delta), you want to overwrite only the daily partition ds. Which dbt model configuration atomically replaces just the affected ds while leaving every other partition untouched?
正解: A
On Databricks/Delta, safe partition-scoped replacement requires combining dbt's insert_overwrite strategy with a partitioned table. Enable dynamic partition overwrite so only the returned partitions are replaced. append leaves the old rows in place (not an overwrite), and merge introduces unnecessary row-level matching when whole-partition replacement is the goal.
Can I use insert_overwrite on Snowflake too?
dbt's insert_overwrite strategy is primarily designed for BigQuery and Spark/Databricks. On Snowflake you typically meet the same requirement with MERGE (or delete+insert). Check the official dbt docs for adapter support.
How can I keep the set of overwritten partitions from growing too wide?
Tightly bound the set inside the is_incremental() block using a date range or change key, and add a pre_hook check that enforces an upper limit on the number of partitions. If the scope is wider than expected, fail the run so an operator can review it.
Is it safe against sources with duplicates or out-of-order events?
Yes, as long as you include logic that produces a definitive result within each overwritten partition (deduplication, latest-wins window functions, aggregation). insert_overwrite itself is just the replacement mechanism; correctness depends entirely on what your SELECT returns.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...