dbt

When to Use dbt incremental_predicates: Production Patterns for Narrowing the Update Target

2026-04-19
NicheeLab Editorial Team

dbt incremental models let you narrow the source rows you ingest with is_incremental(), but they do not automatically narrow which rows on the target (existing table) side get updated. That gap is exactly where incremental_predicates earns its keep.

This article walks through the design points of incremental_predicates for further restricting MERGE-time targets, the typical use cases, adapter-specific differences, and the angles certification exams tend to focus on.

What incremental_predicates is, and why you need it

incremental_predicates is the setting that lets you attach extra filter conditions on the target side of a MERGE (or an equivalent strategy your adapter supports) in an incremental model. At runtime, the adapter macro splices them as AND conditions against the MERGE target (usually the target table alias), restricting which rows are candidates for update or delete.

If you only filter the source side via is_incremental(), the target side still tends to be scanned in full. On huge fact tables with many partitions or clusters, the MERGE match search can become expensive and lock-heavy. By using incremental_predicates to narrow the range to only the target rows that could actually be affected, you reduce scan volume, spend, and lock time all at once.

  • When to use it: huge fact tables partitioned by day or hour, or tables where CDC only changes a small subset of keys
  • Effect: shrinks the MERGE target (avoids full table scans, prunes partitions and clusters)
  • Caveat: predicates that are too narrow will miss rows. Always design alongside late-arriving data handling and reprocessing strategy
ApproachWhere it filtersPrimary use / effectRisks and caveats
is_incremental() filterSource side (rows pulled from staging)Reduces incremental input; lightens extraction and aggregationDoes not address target lookup; the MERGE-side scan remains
incremental_predicatesTarget side (MERGE candidate rows)Shrinks update and match search; promotes partition / cluster pruningBad predicate design causes misses; you must verify adapter-specific behavior
Warehouse partitioning features (partitions, clusters, indexes)Storage / engineThe physical foundation that enables scan reductionRequires definition and maintenance; logical predicates alone are not always enough

How the MERGE target gets narrowed

SOURCEIncrementalTARGETExistingFILTERFiltered by is_incremental()TARGETExisting (narrowed by incremental_predicates)MERGE USING <filtered source> ON <keys>MATCHED AND (incremental_predicates) → UPDATE/DELETE / NOT MATCHED → INSERTWith incremental_predicates, only TARGET rows matching the predicate become candidates for match lookup and update

Basic form: pass the predicates as a list (using the adapter's internal aliases)

{{ config(
    materialized='incremental',
    unique_key='order_id',
    incremental_strategy='merge',
    -- Example: limit to potentially affected partitions (internal alias names are usually fixed per adapter)
    incremental_predicates=[
      "DBT_INTERNAL_DEST.order_date >= (select min(order_date) from DBT_INTERNAL_SOURCE)",
      "DBT_INTERNAL_DEST.order_date <= (select max(order_date) from DBT_INTERNAL_SOURCE)"
    ]
) }}

select ... from {{ ref('stg_orders') }}
{% if is_incremental() %}
  where updated_at >= (select max(updated_at) from {{ this }})
{% endif %}

Use case 1: Narrow the target by partition range

For huge tables partitioned by date or time, the standard play is to restrict the target to just the period the incremental source touches. Use the min and max partition values from the source side and MERGE only against the same range on the target.

BigQuery, Snowflake, and Databricks all prune aggressively on partition-column predicates. Read only fresh data with is_incremental() while updating only the relevant partitions via incremental_predicates. This two-stage filter shrinks both reads and writes.

  • Assumes a partition column is already defined as DATE or TIMESTAMP
  • Using the source-side min/max gives a safe range even when some late-arriving rows are included
  • If long-range reprocessing is needed, parameterize the target window so it can be adjusted

Example: range-bounding a BigQuery date partition

{{ config(
    materialized='incremental',
    unique_key='order_id',
    incremental_strategy='merge',
    partition_by={'field': 'order_date', 'data_type': 'date'},
    incremental_predicates=[
      -- Bound TARGET partitions by the SOURCE min/max date
      "DBT_INTERNAL_DEST.order_date between (select min(order_date) from DBT_INTERNAL_SOURCE)
                                       and (select max(order_date) from DBT_INTERNAL_SOURCE)"
    ]
) }}

select ... from {{ ref('stg_orders') }}
{% if is_incremental() %}
  where order_date >= date_sub(current_date(), interval 7 day)
{% endif %}

Use case 2: Narrow the target by changed-key set (CDC key filtering)

With CDC (change data capture), only a tiny subset of keys changes per ingestion. Materialize the "keys touched this run" as a separate model (ephemeral works fine) and restrict the target to that set.

This lets you avoid a full scan of a huge target while still MERGEing exactly the rows you care about.

  • Deduplicate the key set with DISTINCT
  • Ephemeral models inline into SQL, so the reference cost is low
  • For composite keys, concatenate or use a struct so the join keys align

Example: restrict to just the customer_ids touched this run

-- models/int_keys_this_run.sql (ephemeral)
{{ config(materialized='ephemeral') }}
select distinct customer_id
from {{ ref('stg_cdc_events') }}
{% if is_incremental() %}
  where _ingested_at >= (select coalesce(max(_ingested_at), '1970-01-01') from {{ this }})
{% endif %}

-- models/fct_customers.sql (incremental)
{{ config(
    materialized='incremental',
    unique_key='customer_id',
    incremental_strategy='merge',
    incremental_predicates=[
      "DBT_INTERNAL_DEST.customer_id in (select customer_id from {{ ref('int_keys_this_run') }})"
    ]
) }}
select ... from {{ ref('stg_customer_snapshot') }}

Operational caveats and pitfalls

Preventing missed rows comes first. Build incremental_predicates around safe-side ranges like the source min/max to account for late-arriving rows and upsert resends. Even for key-based filtering, leaving a reprocessing window (e.g., always include keys from the last N days) provides peace of mind.

Mind the interaction with schema changes and added columns. When restricting which columns MERGE updates, combine merge_update_columns with on_schema_change to keep things consistent. Note that incremental_predicates is ignored during a full refresh, which rebuilds the entire table.

  • Always have one of: "reprocess the last N days in a single update" or "add the last N days into the key set" as a late-arrival safeguard
  • Keep merge_update_columns minimal; avoid updating columns you do not need to touch
  • Monitor: emit metrics for updated row count, scan bytes, and number of target partitions

Example config: restricted update columns and schema-change policy

{{ config(
    materialized='incremental',
    unique_key='id',
    incremental_strategy='merge',
    merge_update_columns=['status', 'updated_at'],
    on_schema_change='append_new_columns',  -- Options vary by adapter
    incremental_predicates=[
      "DBT_INTERNAL_DEST.dt >= dateadd(day, -3, current_date)"  -- Snowflake example
    ]
) }}
select id, status, updated_at, dt from {{ ref('stg_events') }}

Adapter differences and verification steps

Where incremental_predicates is applied and what the internal alias names are depend on the adapter implementation (BigQuery, Snowflake, Databricks / Spark, etc.). Most implementations splice them in as AND conditions on the MERGE WHEN MATCHED clause (or the target alias), but they may be irrelevant when the strategy is insert_overwrite.

Before going to production, always run dbt compile and visually verify in the generated SQL that the predicates land on the target side as intended. For extra safety, A/B test on a small dataset and measure the row-count and scan-volume differences.

  • incremental_predicates has no effect on append strategies that do not use MERGE
  • insert_overwrite (especially BigQuery partition overwrite) relies on a different set of optimizations
  • Always confirm how DBT_INTERNAL_SOURCE / DBT_INTERNAL_DEST expand in the compiled output

Basic commands and check items for verification

# Compile and inspect the SQL
$ dbt compile -s models/fact_orders.sql

# Look for this in the generated SQL (example)
-- WHEN MATCHED AND (DBT_INTERNAL_DEST.order_date between ...)
-- Confirm it appears exactly where expected

# Compare on a small sample
$ dbt run -s fact_orders --vars '{sample: true}'
$ dbt run -s fact_orders --full-refresh  # Baseline comparison

# Metrics
-- Target partition count, updated row count, scan bytes / duration

Points the Analytics Engineer exam tends to test

is_incremental() filters source-side rows. incremental_predicates narrows the target-side update target. Be able to articulate that role difference clearly.

Lock in the fact that behavior differs under append and insert_overwrite strategies, and that the real value shows up under the MERGE strategy. Being able to express partition-range and key-set restriction patterns in just a few lines of SQL pays off on both the exam and the job.

  • Definition: incremental_predicates is an extra predicate used to narrow the MERGE target (applied by the adapter)
  • Where it applies: large partitioned tables, CDC where only a subset of keys change
  • Where it does not: append (no updates), insert_overwrite (overwrite grain is the main design lever)

Template for a frequently tested pattern

{{ config(
    materialized='incremental',
    unique_key='order_id',
    incremental_strategy='merge',
    incremental_predicates=[
      "DBT_INTERNAL_DEST.partition_date between (select min(partition_date) from DBT_INTERNAL_SOURCE)
                                         and (select max(partition_date) from DBT_INTERNAL_SOURCE)"
    ]
) }}
select ... from {{ ref('stg_orders_incremental') }}
{% if is_incremental() %}
  where partition_date >= date_sub(current_date(), interval 3 day)
{% endif %}

Check your understanding

Analytics Engineer

問題 1

You are incrementally updating a huge date-partitioned orders table with the MERGE strategy. Only the last 7 days are expected to have inserts or updates, and you want the MERGE target side restricted to that range to cut scan volume. Which configuration is most appropriate?

  1. Filter just the last 7 days in the is_incremental() WHERE clause (without restricting the target side)
  2. Set incremental_predicates to DBT_INTERNAL_DEST.partition_date BETWEEN (select min(partition_date) from DBT_INTERNAL_SOURCE) AND (select max(partition_date) from DBT_INTERNAL_SOURCE)
  3. Set on_schema_change to ignore
  4. Add tag: [incremental] to the model

正解: B

Filtering only on the source side leaves the MERGE target lookup wide open. The right answer is to also restrict the target side to the relevant partitions with incremental_predicates. on_schema_change and tags do not serve this goal.

Frequently Asked Questions

Can incremental_predicates replace is_incremental()?

No. is_incremental() filters the source rows you ingest, while incremental_predicates narrows the matches and updates on the MERGE target side. The production best practice is to combine both for a two-stage filter.

Do append or insert_overwrite strategies benefit from it?

append (insert-only) never updates, so incremental_predicates has no effect. insert_overwrite (especially BigQuery partition overwrite) is primarily about partition-grain design and usually does not consult incremental_predicates. The benefit only shows up with MERGE-based strategies.

Is it OK to reference the internal aliases DBT_INTERNAL_SOURCE / DBT_INTERNAL_DEST?

Many adapters use them consistently, but the names ultimately depend on the adapter implementation. Always inspect the SQL produced by dbt compile to confirm the predicates are applied to the target side as expected, and watch out for version differences.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.