append is the fastest incremental strategy: it never updates existing rows and only appends new ones. It is perfect for immutable data such as event logs and audit trails, but a bad design easily leads to duplicates or missing rows.
This article walks through the minimum requirements and patterns for using append safely in production based on stable dbt features, plus the points most frequently asked on the Analytics Engineer exam.
append builds the table in full on the first run, and from the second run onward only inserts new records filtered through the is_incremental() block. Existing rows are never updated or deleted.
Therefore, you should only adopt append when the data is effectively immutable or when upsert semantics are not required. Choose a different strategy such as merge if updates are needed.
| Strategy | Basic Behavior | Update / Delete Support | Typical Use Cases |
|---|---|---|---|
| append | INSERT new rows only | Not supported | Event logs, immutable audit trails, append-only CDC streams |
| merge | Update/insert on key match, delete when conditions allow | Supported (adapter-dependent) | Dimension upserts, maintaining the latest state |
| insert_overwrite | Overwrite by partition | Limited (replacement within a partition) | Regenerating daily/hourly partitions on BigQuery and similar engines |
Data flow of the append strategy (simple insert-only)
Minimal configuration of the append strategy (sketch)
{{ config(materialized='incremental', incremental_strategy='append') }}
with src as (
select * from {{ ref('stg_events') }}
{% if is_incremental() %}
-- Filter that passes only new records (concrete example below)
{% endif %}
)
select * from src;On the first run, is_incremental() is false in a dbt incremental model, so the full SELECT runs. From the next run onward it is true, and the WHERE clause you write picks up only the new rows. unique_key is not used in append (specifying it does not enable upserts).
The most important decision is choosing the watermark column. Typically you reference the max value of a monotonically increasing timestamp (such as created_at or _ingested_at) from {{ this }}, and only ingest source rows newer than that. In practice, you also set a safety window (e.g., rolling back 2 days) to account for late arrivals.
Basic watermark reference pattern
{{ config(materialized='incremental', incremental_strategy='append') }}
with src as (
select * from {{ ref('stg_events') }}
{% if is_incremental() %}
where created_at > coalesce((select max(created_at) from {{ this }}), '1970-01-01')
{% endif %}
)
select * from src;Here is a production-ready minimal example. on_schema_change behavior is adapter-dependent, so verify support before using it. When supported, append_new_columns adds new columns and fills NULL into existing rows.
Combining staging-side pre-deduplication with a target-side boundary column filter is the safest approach.
Operational template for the append strategy (the concept is the same on Snowflake, Databricks, and BigQuery)
{{ config(
materialized='incremental',
incremental_strategy='append',
on_schema_change='append_new_columns', -- when supported by the adapter
tags=['incremental','append']
) }}
with deduped as (
select
id,
created_at,
payload,
row_number() over (partition by id, created_at order by _ingested_at desc) as rn
from {{ ref('stg_events') }}
), src as (
select id, created_at, payload from deduped where rn = 1
{% if is_incremental() %}
-- Roll back 2 days to cover late arrivals (upstream removes ingest dupes)
where created_at >= dateadd(day, -2,
coalesce((select max(created_at) from {{ this }}), '1970-01-01')
)
{% endif %}
)
select id, created_at, payload from src;Because append cannot update, duplicates appear easily when data arrives late or the same row is resent. To avoid this, combine upstream deduplication with safety-window re-ingestion. Widening the ingestion window increases re-ingested volume, but staging removes the duplicates.
For dbt tests, instead of enforcing a single fully unique key on the target table, it is more effective to apply a uniqueness test to a realistic combination (e.g., id and created_at).
Example uniqueness test in schema.yml (target table)
version: 2
models:
- name: fct_events_append
tests: []
columns:
- name: id
tests: [not_null]
- name: created_at
tests: [not_null]
tests:
- dbt_expectations.expect_compound_columns_to_be_unique:
column_list: ['id','created_at']Because append cannot fix existing rows, a full refresh is effective when correcting upstream-induced gaps or type errors. --full-refresh rebuilds the target model from scratch.
Configure schema change handling according to the adapter's support for on_schema_change. On unsupported adapters, applying changes via a full refresh is the safest approach.
Frequently used commands
# Run a single model incrementally
dbt run -s fct_events_append
# Batch run by tag
dbt run -m tag:append
# Full refresh
dbt run -s fct_events_append --full-refreshOn the Analytics Engineer exam, fundamentals such as when append applies, the role of is_incremental, how unique_key behaves, and what full refresh means come up frequently. Make sure you can describe the differences from merge precisely in writing.
In particular, these three points tend to be asked together: append does not change existing rows; specifying unique_key does not trigger upserts; you must design countermeasures for late arrivals.
Incorrect vs. correct examples (conceptual)
-- Wrong: setting unique_key on append does NOT enable upserts
{{ config(materialized='incremental', incremental_strategy='append', unique_key='id') }}
-- Correct: define new rows via is_incremental() with append
{{ config(materialized='incremental', incremental_strategy='append') }}
{% if is_incremental() %}
where created_at > (select max(created_at) from {{ this }})
{% endif %}Analytics Engineer
問題 1
Which statement is correct when using the append strategy in a dbt incremental model?
正解: C
append only appends new rows and does not upsert. unique_key is not used, and you design protections against late arrivals and duplicates via the is_incremental() condition and staging. --full-refresh is valid and rebuilds the table.
What happens if I set unique_key with the append strategy?
unique_key is not used in append. Existing rows are not updated and duplicates are not resolved, so use the merge strategy if you need upsert semantics.
What is the safest way to handle late-arriving data?
A practical approach is to set a safety window in the is_incremental() WHERE clause that rolls back a few days from the max watermark, deduplicate in staging using row_number, and then append to the target table.
What happens when the schema changes (e.g., a column is added)?
If the adapter supports options like on_schema_change=append_new_columns, you can add new columns automatically. For unsupported adapters or complex changes, rebuilding the model with --full-refresh is the safer option.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...