dbt incremental models let you narrow the source rows you ingest with is_incremental(), but they do not automatically narrow which rows on the target (existing table) side get updated. That gap is exactly where incremental_predicates earns its keep.
This article walks through the design points of incremental_predicates for further restricting MERGE-time targets, the typical use cases, adapter-specific differences, and the angles certification exams tend to focus on.
incremental_predicates is the setting that lets you attach extra filter conditions on the target side of a MERGE (or an equivalent strategy your adapter supports) in an incremental model. At runtime, the adapter macro splices them as AND conditions against the MERGE target (usually the target table alias), restricting which rows are candidates for update or delete.
If you only filter the source side via is_incremental(), the target side still tends to be scanned in full. On huge fact tables with many partitions or clusters, the MERGE match search can become expensive and lock-heavy. By using incremental_predicates to narrow the range to only the target rows that could actually be affected, you reduce scan volume, spend, and lock time all at once.
| Approach | Where it filters | Primary use / effect | Risks and caveats |
|---|---|---|---|
| is_incremental() filter | Source side (rows pulled from staging) | Reduces incremental input; lightens extraction and aggregation | Does not address target lookup; the MERGE-side scan remains |
| incremental_predicates | Target side (MERGE candidate rows) | Shrinks update and match search; promotes partition / cluster pruning | Bad predicate design causes misses; you must verify adapter-specific behavior |
| Warehouse partitioning features (partitions, clusters, indexes) | Storage / engine | The physical foundation that enables scan reduction | Requires definition and maintenance; logical predicates alone are not always enough |
How the MERGE target gets narrowed
Basic form: pass the predicates as a list (using the adapter's internal aliases)
{{ config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge',
-- Example: limit to potentially affected partitions (internal alias names are usually fixed per adapter)
incremental_predicates=[
"DBT_INTERNAL_DEST.order_date >= (select min(order_date) from DBT_INTERNAL_SOURCE)",
"DBT_INTERNAL_DEST.order_date <= (select max(order_date) from DBT_INTERNAL_SOURCE)"
]
) }}
select ... from {{ ref('stg_orders') }}
{% if is_incremental() %}
where updated_at >= (select max(updated_at) from {{ this }})
{% endif %}For huge tables partitioned by date or time, the standard play is to restrict the target to just the period the incremental source touches. Use the min and max partition values from the source side and MERGE only against the same range on the target.
BigQuery, Snowflake, and Databricks all prune aggressively on partition-column predicates. Read only fresh data with is_incremental() while updating only the relevant partitions via incremental_predicates. This two-stage filter shrinks both reads and writes.
Example: range-bounding a BigQuery date partition
{{ config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge',
partition_by={'field': 'order_date', 'data_type': 'date'},
incremental_predicates=[
-- Bound TARGET partitions by the SOURCE min/max date
"DBT_INTERNAL_DEST.order_date between (select min(order_date) from DBT_INTERNAL_SOURCE)
and (select max(order_date) from DBT_INTERNAL_SOURCE)"
]
) }}
select ... from {{ ref('stg_orders') }}
{% if is_incremental() %}
where order_date >= date_sub(current_date(), interval 7 day)
{% endif %}With CDC (change data capture), only a tiny subset of keys changes per ingestion. Materialize the "keys touched this run" as a separate model (ephemeral works fine) and restrict the target to that set.
This lets you avoid a full scan of a huge target while still MERGEing exactly the rows you care about.
Example: restrict to just the customer_ids touched this run
-- models/int_keys_this_run.sql (ephemeral)
{{ config(materialized='ephemeral') }}
select distinct customer_id
from {{ ref('stg_cdc_events') }}
{% if is_incremental() %}
where _ingested_at >= (select coalesce(max(_ingested_at), '1970-01-01') from {{ this }})
{% endif %}
-- models/fct_customers.sql (incremental)
{{ config(
materialized='incremental',
unique_key='customer_id',
incremental_strategy='merge',
incremental_predicates=[
"DBT_INTERNAL_DEST.customer_id in (select customer_id from {{ ref('int_keys_this_run') }})"
]
) }}
select ... from {{ ref('stg_customer_snapshot') }}Preventing missed rows comes first. Build incremental_predicates around safe-side ranges like the source min/max to account for late-arriving rows and upsert resends. Even for key-based filtering, leaving a reprocessing window (e.g., always include keys from the last N days) provides peace of mind.
Mind the interaction with schema changes and added columns. When restricting which columns MERGE updates, combine merge_update_columns with on_schema_change to keep things consistent. Note that incremental_predicates is ignored during a full refresh, which rebuilds the entire table.
Example config: restricted update columns and schema-change policy
{{ config(
materialized='incremental',
unique_key='id',
incremental_strategy='merge',
merge_update_columns=['status', 'updated_at'],
on_schema_change='append_new_columns', -- Options vary by adapter
incremental_predicates=[
"DBT_INTERNAL_DEST.dt >= dateadd(day, -3, current_date)" -- Snowflake example
]
) }}
select id, status, updated_at, dt from {{ ref('stg_events') }}Where incremental_predicates is applied and what the internal alias names are depend on the adapter implementation (BigQuery, Snowflake, Databricks / Spark, etc.). Most implementations splice them in as AND conditions on the MERGE WHEN MATCHED clause (or the target alias), but they may be irrelevant when the strategy is insert_overwrite.
Before going to production, always run dbt compile and visually verify in the generated SQL that the predicates land on the target side as intended. For extra safety, A/B test on a small dataset and measure the row-count and scan-volume differences.
Basic commands and check items for verification
# Compile and inspect the SQL
$ dbt compile -s models/fact_orders.sql
# Look for this in the generated SQL (example)
-- WHEN MATCHED AND (DBT_INTERNAL_DEST.order_date between ...)
-- Confirm it appears exactly where expected
# Compare on a small sample
$ dbt run -s fact_orders --vars '{sample: true}'
$ dbt run -s fact_orders --full-refresh # Baseline comparison
# Metrics
-- Target partition count, updated row count, scan bytes / durationis_incremental() filters source-side rows. incremental_predicates narrows the target-side update target. Be able to articulate that role difference clearly.
Lock in the fact that behavior differs under append and insert_overwrite strategies, and that the real value shows up under the MERGE strategy. Being able to express partition-range and key-set restriction patterns in just a few lines of SQL pays off on both the exam and the job.
Template for a frequently tested pattern
{{ config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge',
incremental_predicates=[
"DBT_INTERNAL_DEST.partition_date between (select min(partition_date) from DBT_INTERNAL_SOURCE)
and (select max(partition_date) from DBT_INTERNAL_SOURCE)"
]
) }}
select ... from {{ ref('stg_orders_incremental') }}
{% if is_incremental() %}
where partition_date >= date_sub(current_date(), interval 3 day)
{% endif %}Analytics Engineer
問題 1
You are incrementally updating a huge date-partitioned orders table with the MERGE strategy. Only the last 7 days are expected to have inserts or updates, and you want the MERGE target side restricted to that range to cut scan volume. Which configuration is most appropriate?
正解: B
Filtering only on the source side leaves the MERGE target lookup wide open. The right answer is to also restrict the target side to the relevant partitions with incremental_predicates. on_schema_change and tags do not serve this goal.
Can incremental_predicates replace is_incremental()?
No. is_incremental() filters the source rows you ingest, while incremental_predicates narrows the matches and updates on the MERGE target side. The production best practice is to combine both for a two-stage filter.
Do append or insert_overwrite strategies benefit from it?
append (insert-only) never updates, so incremental_predicates has no effect. insert_overwrite (especially BigQuery partition overwrite) is primarily about partition-grain design and usually does not consult incremental_predicates. The benefit only shows up with MERGE-based strategies.
Is it OK to reference the internal aliases DBT_INTERNAL_SOURCE / DBT_INTERNAL_DEST?
Many adapters use them consistently, but the names ultimately depend on the adapter implementation. Always inspect the SQL produced by dbt compile to confirm the predicates are applied to the target side as expected, and watch out for version differences.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...