dbt

dbt Snapshots for History Tracking: Choosing Between Timestamp and Check Strategies

2026-04-19
NicheeLab Editorial Team

dbt Snapshots implement SCD Type 2 by appending a history row whenever a source row changes and managing the validity window. The two main strategies are timestamp and check, and both require a unique_key.

This article covers how to choose between the two strategies, common pitfalls, warehouse-specific tuning in practice, and the points most likely to appear on the exam, all aligned with the official documentation.

Snapshot Basics and Generated Columns

dbt Snapshots detect row-level changes against the result of a SELECT statement and update the history table. When a change is detected, the existing current row's validity window is closed and a new current row is appended. As a result, multiple versions accumulate for the same unique_key.

Snapshot tables include control columns added by dbt. These are common across both strategies and can be used for history-integrity checks and point-in-time queries.

  • Required: unique_key and strategy (timestamp or check)
  • Shared control columns: dbt_scd_id, dbt_valid_from, dbt_valid_to, dbt_updated_at
  • invalidate_hard_deletes: true also expires rows that have disappeared from the source (hard-delete invalidation)
  • Snapshots are defined as snapshot blocks rather than models and are refreshed with the dbt snapshot command
ColumnRoleNotes
dbt_scd_idUnique identifier for the SCD rowComputed by dbt based on unique_key and strategy
dbt_valid_fromStart of the row's validityTime the change was detected (or updated_at)
dbt_valid_toEnd of the row's validityNULL for the current row; a timestamp is written on expiration
dbt_updated_atUpdate reference assigned by dbt at processing timeComputation differs depending on the strategy

Minimal Snapshot configuration (check strategy)

{% snapshot customers_snapshot %}
{{
  config(
    target_schema='snapshots',
    unique_key='customer_id',
    strategy='check',
    check_cols=['email', 'address', 'status'],
    invalidate_hard_deletes=true
  )
}}
select customer_id, email, address, status
from {{ source('app', 'customers') }}
{% endsnapshot %}

Timestamp vs Check Strategy Comparison

The timestamp strategy detects row changes using a single updated_at column. It is best suited for datasets where updated_at increases monotonically and is reliably updated on every change.

The check strategy detects changes based on value differences in a specified set of columns. It is useful when the update timestamp is unreliable or when you want to base change detection only on a few business-critical attributes.

  • If you have a reliable updated_at, timestamp is simpler and tends to be faster
  • If the update timestamp is untrustworthy, missing, or delayed, use check and monitor only the business-critical columns
  • Exclude nondeterministic or noisy columns (the update timestamp itself, random values, floating-point micro-deltas) from check
AspectTimestamp strategyCheck strategy
Detection logicDetects when updated_at advancesDetects when check_cols values change
Required dataA trustworthy updated_atStable, deterministic columns (one or more)
Misdetection riskMisses changes if the timestamp is delayed or missingIncludes too many columns causes false positives (noise)
PerformanceRelatively light (single-column lookup)Depends on the number of columns and comparison cost
Use caseDWH tables with a CDC-sourced updated_atApp-sourced data where updated_at is inconsistent or you only want to track specific attributes

How Snapshots track history (SCD2)

Source (current)customer_id=42 / email=a@x / address=TokyoSnapshot (history)v1: valid_from=t0, valid_to=t1 / v2: valid_from=t1, valid_to=NULLSCD2 history tracking via change detection

Skeletons for both strategies

{% snapshot orders_ts %}
{{
  config(
    target_schema='snapshots',
    unique_key='order_id',
    strategy='timestamp',
    updated_at='updated_at'
  )
}}
select order_id, updated_at, amount, status
from {{ ref('stg_orders') }}
{% endsnapshot %}

{% snapshot users_ck %}
{{
  config(
    target_schema='snapshots',
    unique_key='user_id',
    strategy='check',
    check_cols=['email', 'is_active', 'plan']
  )
}}
select user_id, email, is_active, plan
from {{ ref('stg_users') }}
{% endsnapshot %}

Practical Tips for the Timestamp Strategy

updated_at must be rewritten on every update at the source. If updated_at points to the past due to late arrivals or backfills, changes will be missed. Ensure a trustworthy, monotonically increasing timestamp.

Time zone handling depends on the warehouse. Mismatched zones between source and DWH shift window boundaries. Normalize to UTC in the ETL layer or explicitly convert inside the Snapshot SELECT.

  • During backfills, do not overwrite updated_at with the load time instead of the actual update time
  • Rounding the timestamp to align with daily rollups can collapse changes
  • Consider invalidate_hard_deletes=true to handle hard deletes
SymptomCauseMitigation
Missed detectionupdated_at is delayed or goes backwardsGuarantee monotonic growth in the ETL; reprocess via a late-arrival flag
Boundary driftTime zone mismatchNormalize to UTC and convert explicitly in the SELECT
Wasteful history rowsMechanical periodic updates change updated_atOnly update at the application layer on semantically meaningful changes

Conceptual example of normalizing time inside the SELECT

{% snapshot payments_ts %}
{{
  config(
    target_schema='snapshots',
    unique_key='payment_id',
    strategy='timestamp',
    updated_at='updated_at_utc'
  )
}}
select
  payment_id,
  -- ここで UTC に正規化(倉庫に応じて適切な関数を使用)
  cast(updated_at as timestamp) as updated_at_utc,
  amount, status
from {{ ref('stg_payments_raw') }}
{% endsnapshot %}

Practical Tips for the Check Strategy

Restrict check_cols to columns that reflect business-meaningful changes. Exclude log-derived timestamps, sort orders, and outputs of nondeterministic functions. Floating-point values often cause false positives via rounding errors, so preprocessing with rounding or stringification before comparison is effective.

If you want to check a derived column, compute it deterministically in the Snapshot SELECT before including it in check_cols. This also minimizes the impact of source schema changes.

  • check_cols may need NULL-safe comparisons; use coalesce inside the SELECT
  • Comparing many columns increases cost; keep it to the bare minimum
  • Standardize monetary rounding per business rules (e.g., fixed to two decimal places)
SymptomCauseMitigation
Over-detectionIncluded noisy columnsNarrow check_cols and apply preprocessing
Detection from tiny deltasFloating-point rounding errorsStabilize via round, cast, or stringification
Missed detectionMissing columns that should be comparedReview your diff definition against requirements and revisit the column list

Example of stabilizing check_cols via preprocessing

{% snapshot subscription_ck %}
{{
  config(
    target_schema='snapshots',
    unique_key='account_id',
    strategy='check',
    check_cols=['plan', 'is_active', 'mrr_str']
  )
}}
with src as (
  select account_id, plan, is_active, mrr
  from {{ ref('stg_subscription') }}
)
select
  account_id,
  plan,
  is_active,
  -- 丸めと型統一により比較を安定化
  cast(round(mrr, 2) as string) as mrr_str
from src
{% endsnapshot %}

Operations and Performance: Scheduling and Growth Control

Snapshots evaluate diffs on every run. Aligning the schedule with the source update frequency is critical; too frequent and you incur unnecessary comparison cost, too infrequent and detection lag grows.

History tables grow over time. If you only query the current rows, expose a view that filters dbt_valid_to IS NULL and optimize reads with partitioning or clustering.

  • Aligning the schedule right after source updates reduces waste
  • On large tables, use time-series columns like dbt_valid_from for partition and clustering keys
  • Enable invalidate_hard_deletes=true if you need hard-delete detection
  • Isolate a dedicated target_schema for snapshots to manage permissions and lifecycle
Warehouse / aspectRecommended settingGoal
Common (scheduling)Once right after updates; restrict the time window if neededReduce wasted diff comparisons
Common (storage)Partition and cluster by dbt_valid_from/valid_toReduce I/O for time-ordered queries
Common (views)Current-row view: where dbt_valid_to is nullKeep the bulk of reads lightweight

Example view that returns only the current rows

create or replace view analytics.v_customers_current as
select *
from snapshots.customers_snapshot
where dbt_valid_to is null;

Exam Tips: Common Traps

The Analytics Engineer exam frequently tests the conditions for choosing timestamp vs check, the requirement for unique_key, hard-delete handling, and the choice of comparison columns. It rewards semantically correct diff definitions and stable comparisons over operator trivia or warehouse-specific functions.

Snapshots and incremental models have different purposes. Incremental is a means to efficiently update the latest state; Snapshot is a means to preserve history. Do not conflate the two.

  • unique_key is required; an unstable one breaks history
  • Timestamp depends on updated_at; check depends on business columns
  • invalidate_hard_deletes lets you expire rows removed from the source
  • Do not include nondeterministic columns or micro-deltas in check
ItemWhat to rememberCommon trap
unique_keyRequired for every snapshotDesigns where the natural key is unstable
timestampAssumes a trustworthy updated_atLate arrivals cause missed detection
checkCompare only business-critical columnsIncluding log columns or the update timestamp

Example tests in schema.yml (unique and not_null)

version: 2
snapshots:
  - name: customers_snapshot
    tests:
      - not_null:
          column_name: customer_id
      - unique:
          column_name: customer_id

Check Your Understanding

Analytics Engineer

問題 1

You want to snapshot an online orders table. The app updates amount and status when the amount changes, but some backfills overwrite updated_at with a past timestamp. You want to suppress false positives without missing real changes. Which configuration is most appropriate?

  1. Use the check strategy with check_cols=['amount','status'] and round amount before comparing
  2. Use the timestamp strategy with updated_at, but compare a rounded updated_at
  3. Use the check strategy with check_cols set to every column for an exact-match comparison
  4. Use the timestamp strategy and compare the greater of updated_at and the current time

正解: A

Because updated_at is unreliable, timestamp is a poor fit. The correct approach is check restricted to business-meaningful columns (amount, status), with rounding to avoid floating-point error on amount. Comparing every column amplifies noise, and mixing in the current time invites false positives.

Frequently Asked Questions

What is the difference between Snapshot and incremental models?

Incremental models efficiently update a table to its latest state without keeping history. Snapshots close the validity window on the previous row and append a new row on each change, preserving full history. They serve different purposes.

If a row is hard-deleted from the source, does Snapshot automatically expire it?

Not by default. Enabling invalidate_hard_deletes=true makes the snapshot run detect rows that no longer exist in the source and update dbt_valid_to to expire them.

Is it safe to include computed columns in check_cols?

It is safe as long as the computation is deterministic and stable. Avoid nondeterministic functions (current time, random) and floating-point values prone to rounding errors. Preprocess with round, cast, or coalesce where needed.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.