dbt

Mastering the dbt incremental append Strategy: Design, Implementation, and Exam Prep

2026-04-19
NicheeLab Editorial Team

append is the fastest incremental strategy: it never updates existing rows and only appends new ones. It is perfect for immutable data such as event logs and audit trails, but a bad design easily leads to duplicates or missing rows.

This article walks through the minimum requirements and patterns for using append safely in production based on stable dbt features, plus the points most frequently asked on the Analytics Engineer exam.

Overview of the append Strategy and When to Use It

append builds the table in full on the first run, and from the second run onward only inserts new records filtered through the is_incremental() block. Existing rows are never updated or deleted.

Therefore, you should only adopt append when the data is effectively immutable or when upsert semantics are not required. Choose a different strategy such as merge if updates are needed.

  • Pros: Fastest and lowest-cost option. Minimal impact from locking or contention.
  • Cons: No support for updates or deletes. You must guard against duplicates and late-arriving data yourself.
  • Requirement: A column that serves as the incremental boundary (watermark), such as created_at or _ingested_at.
StrategyBasic BehaviorUpdate / Delete SupportTypical Use Cases
appendINSERT new rows onlyNot supportedEvent logs, immutable audit trails, append-only CDC streams
mergeUpdate/insert on key match, delete when conditions allowSupported (adapter-dependent)Dimension upserts, maintaining the latest state
insert_overwriteOverwrite by partitionLimited (replacement within a partition)Regenerating daily/hourly partitions on BigQuery and similar engines

Data flow of the append strategy (simple insert-only)

SourceStagingFormatting / dedup pre-processingIncremental Model (append)Target TableExisting rows: unchanged / New rows: appended

Minimal configuration of the append strategy (sketch)

{{ config(materialized='incremental', incremental_strategy='append') }}

with src as (
  select * from {{ ref('stg_events') }}
  {% if is_incremental() %}
    -- Filter that passes only new records (concrete example below)
  {% endif %}
)
select * from src;

Internals and Requirements: is_incremental and Watermark Design

On the first run, is_incremental() is false in a dbt incremental model, so the full SELECT runs. From the next run onward it is true, and the WHERE clause you write picks up only the new rows. unique_key is not used in append (specifying it does not enable upserts).

The most important decision is choosing the watermark column. Typically you reference the max value of a monotonically increasing timestamp (such as created_at or _ingested_at) from {{ this }}, and only ingest source rows newer than that. In practice, you also set a safety window (e.g., rolling back 2 days) to account for late arrivals.

  • Required: A monotonically increasing boundary column with few gaps (watch out for time zone consistency too)
  • Recommended: Reference max(boundary column) of {{ this }} inside is_incremental()
  • Caveat: append cannot update existing rows. Switch to the merge strategy if updates are needed.

Basic watermark reference pattern

{{ config(materialized='incremental', incremental_strategy='append') }}

with src as (
  select * from {{ ref('stg_events') }}
  {% if is_incremental() %}
    where created_at > coalesce((select max(created_at) from {{ this }}), '1970-01-01')
  {% endif %}
)
select * from src;

Implementation Template: A Solid append Pattern

Here is a production-ready minimal example. on_schema_change behavior is adapter-dependent, so verify support before using it. When supported, append_new_columns adds new columns and fills NULL into existing rows.

Combining staging-side pre-deduplication with a target-side boundary column filter is the safest approach.

  • Explicitly set the on_schema_change policy if new columns are likely to be added
  • Remove key duplicates in staging (using ROW_NUMBER, etc.)
  • Explicitly project the column order in the target (avoids inconsistencies when columns are added)

Operational template for the append strategy (the concept is the same on Snowflake, Databricks, and BigQuery)

{{ config(
  materialized='incremental',
  incremental_strategy='append',
  on_schema_change='append_new_columns',  -- when supported by the adapter
  tags=['incremental','append']
) }}

with deduped as (
  select
    id,
    created_at,
    payload,
    row_number() over (partition by id, created_at order by _ingested_at desc) as rn
  from {{ ref('stg_events') }}
), src as (
  select id, created_at, payload from deduped where rn = 1
  {% if is_incremental() %}
    -- Roll back 2 days to cover late arrivals (upstream removes ingest dupes)
    where created_at >= dateadd(day, -2,
      coalesce((select max(created_at) from {{ this }}), '1970-01-01')
    )
  {% endif %}
)
select id, created_at, payload from src;

Designing Resilience to Late Arrivals and Duplicates, Plus Tests

Because append cannot update, duplicates appear easily when data arrives late or the same row is resent. To avoid this, combine upstream deduplication with safety-window re-ingestion. Widening the ingestion window increases re-ingested volume, but staging removes the duplicates.

For dbt tests, instead of enforcing a single fully unique key on the target table, it is more effective to apply a uniqueness test to a realistic combination (e.g., id and created_at).

  • Safety window: roll back several days from the max boundary using dateadd or similar
  • Staging: narrow down to the single latest record with row_number
  • Tests: detect duplicates early (unique, not_null)

Example uniqueness test in schema.yml (target table)

version: 2
models:
  - name: fct_events_append
    tests: []
    columns:
      - name: id
        tests: [not_null]
      - name: created_at
        tests: [not_null]
    tests:
      - dbt_expectations.expect_compound_columns_to_be_unique:
          column_list: ['id','created_at']

Operations: Full Refresh, Scheduling, and Schema Changes

Because append cannot fix existing rows, a full refresh is effective when correcting upstream-induced gaps or type errors. --full-refresh rebuilds the target model from scratch.

Configure schema change handling according to the adapter's support for on_schema_change. On unsupported adapters, applying changes via a full refresh is the safest approach.

  • Full refresh: use to fix inconsistencies or massive late arrivals in bulk
  • Scheduling: set the frequency to match the ingestion window and upstream arrival lag
  • Monitoring: track row count growth, advancement of max created_at, and error rates

Frequently used commands

# Run a single model incrementally
 dbt run -s fct_events_append

# Batch run by tag
 dbt run -m tag:append

# Full refresh
 dbt run -s fct_events_append --full-refresh

Exam Prep: Key Pitfalls and Essentials

On the Analytics Engineer exam, fundamentals such as when append applies, the role of is_incremental, how unique_key behaves, and what full refresh means come up frequently. Make sure you can describe the differences from merge precisely in writing.

In particular, these three points tend to be asked together: append does not change existing rows; specifying unique_key does not trigger upserts; you must design countermeasures for late arrivals.

  • append is INSERT-only. No updates or deletes.
  • Define new rows (the watermark) inside is_incremental()
  • unique_key is not used in append (it is required for merge)
  • --full-refresh rebuilds the table from scratch

Incorrect vs. correct examples (conceptual)

-- Wrong: setting unique_key on append does NOT enable upserts
{{ config(materialized='incremental', incremental_strategy='append', unique_key='id') }}
-- Correct: define new rows via is_incremental() with append
{{ config(materialized='incremental', incremental_strategy='append') }}
{% if is_incremental() %}
  where created_at > (select max(created_at) from {{ this }})
{% endif %}

Check Your Understanding

Analytics Engineer

問題 1

Which statement is correct when using the append strategy in a dbt incremental model?

  1. If you specify unique_key, existing rows are updated and duplicates are resolved automatically.
  2. is_incremental() makes dbt insert the appropriate WHERE clause automatically, so developers do not need to write any condition.
  3. append only INSERTs new rows and never updates or deletes existing rows. Countermeasures for late-arriving data must be designed in the model itself.
  4. --full-refresh has no effect on an append model; only the incremental run is ever executed.

正解: C

append only appends new rows and does not upsert. unique_key is not used, and you design protections against late arrivals and duplicates via the is_incremental() condition and staging. --full-refresh is valid and rebuilds the table.

Frequently Asked Questions

What happens if I set unique_key with the append strategy?

unique_key is not used in append. Existing rows are not updated and duplicates are not resolved, so use the merge strategy if you need upsert semantics.

What is the safest way to handle late-arriving data?

A practical approach is to set a safety window in the is_incremental() WHERE clause that rolls back a few days from the max watermark, deduplicate in staging using row_number, and then append to the target table.

What happens when the schema changes (e.g., a column is added)?

If the adapter supports options like on_schema_change=append_new_columns, you can add new columns automatically. For unsupported adapters or complex changes, rebuilding the model with --full-refresh is the safer option.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.