dbt Snapshots implement SCD Type 2 by appending a history row whenever a source row changes and managing the validity window. The two main strategies are timestamp and check, and both require a unique_key.
This article covers how to choose between the two strategies, common pitfalls, warehouse-specific tuning in practice, and the points most likely to appear on the exam, all aligned with the official documentation.
dbt Snapshots detect row-level changes against the result of a SELECT statement and update the history table. When a change is detected, the existing current row's validity window is closed and a new current row is appended. As a result, multiple versions accumulate for the same unique_key.
Snapshot tables include control columns added by dbt. These are common across both strategies and can be used for history-integrity checks and point-in-time queries.
| Column | Role | Notes |
|---|---|---|
| dbt_scd_id | Unique identifier for the SCD row | Computed by dbt based on unique_key and strategy |
| dbt_valid_from | Start of the row's validity | Time the change was detected (or updated_at) |
| dbt_valid_to | End of the row's validity | NULL for the current row; a timestamp is written on expiration |
| dbt_updated_at | Update reference assigned by dbt at processing time | Computation differs depending on the strategy |
Minimal Snapshot configuration (check strategy)
{% snapshot customers_snapshot %}
{{
config(
target_schema='snapshots',
unique_key='customer_id',
strategy='check',
check_cols=['email', 'address', 'status'],
invalidate_hard_deletes=true
)
}}
select customer_id, email, address, status
from {{ source('app', 'customers') }}
{% endsnapshot %}The timestamp strategy detects row changes using a single updated_at column. It is best suited for datasets where updated_at increases monotonically and is reliably updated on every change.
The check strategy detects changes based on value differences in a specified set of columns. It is useful when the update timestamp is unreliable or when you want to base change detection only on a few business-critical attributes.
| Aspect | Timestamp strategy | Check strategy |
|---|---|---|
| Detection logic | Detects when updated_at advances | Detects when check_cols values change |
| Required data | A trustworthy updated_at | Stable, deterministic columns (one or more) |
| Misdetection risk | Misses changes if the timestamp is delayed or missing | Includes too many columns causes false positives (noise) |
| Performance | Relatively light (single-column lookup) | Depends on the number of columns and comparison cost |
| Use case | DWH tables with a CDC-sourced updated_at | App-sourced data where updated_at is inconsistent or you only want to track specific attributes |
How Snapshots track history (SCD2)
Skeletons for both strategies
{% snapshot orders_ts %}
{{
config(
target_schema='snapshots',
unique_key='order_id',
strategy='timestamp',
updated_at='updated_at'
)
}}
select order_id, updated_at, amount, status
from {{ ref('stg_orders') }}
{% endsnapshot %}
{% snapshot users_ck %}
{{
config(
target_schema='snapshots',
unique_key='user_id',
strategy='check',
check_cols=['email', 'is_active', 'plan']
)
}}
select user_id, email, is_active, plan
from {{ ref('stg_users') }}
{% endsnapshot %}updated_at must be rewritten on every update at the source. If updated_at points to the past due to late arrivals or backfills, changes will be missed. Ensure a trustworthy, monotonically increasing timestamp.
Time zone handling depends on the warehouse. Mismatched zones between source and DWH shift window boundaries. Normalize to UTC in the ETL layer or explicitly convert inside the Snapshot SELECT.
| Symptom | Cause | Mitigation |
|---|---|---|
| Missed detection | updated_at is delayed or goes backwards | Guarantee monotonic growth in the ETL; reprocess via a late-arrival flag |
| Boundary drift | Time zone mismatch | Normalize to UTC and convert explicitly in the SELECT |
| Wasteful history rows | Mechanical periodic updates change updated_at | Only update at the application layer on semantically meaningful changes |
Conceptual example of normalizing time inside the SELECT
{% snapshot payments_ts %}
{{
config(
target_schema='snapshots',
unique_key='payment_id',
strategy='timestamp',
updated_at='updated_at_utc'
)
}}
select
payment_id,
-- ここで UTC に正規化(倉庫に応じて適切な関数を使用)
cast(updated_at as timestamp) as updated_at_utc,
amount, status
from {{ ref('stg_payments_raw') }}
{% endsnapshot %}Restrict check_cols to columns that reflect business-meaningful changes. Exclude log-derived timestamps, sort orders, and outputs of nondeterministic functions. Floating-point values often cause false positives via rounding errors, so preprocessing with rounding or stringification before comparison is effective.
If you want to check a derived column, compute it deterministically in the Snapshot SELECT before including it in check_cols. This also minimizes the impact of source schema changes.
| Symptom | Cause | Mitigation |
|---|---|---|
| Over-detection | Included noisy columns | Narrow check_cols and apply preprocessing |
| Detection from tiny deltas | Floating-point rounding errors | Stabilize via round, cast, or stringification |
| Missed detection | Missing columns that should be compared | Review your diff definition against requirements and revisit the column list |
Example of stabilizing check_cols via preprocessing
{% snapshot subscription_ck %}
{{
config(
target_schema='snapshots',
unique_key='account_id',
strategy='check',
check_cols=['plan', 'is_active', 'mrr_str']
)
}}
with src as (
select account_id, plan, is_active, mrr
from {{ ref('stg_subscription') }}
)
select
account_id,
plan,
is_active,
-- 丸めと型統一により比較を安定化
cast(round(mrr, 2) as string) as mrr_str
from src
{% endsnapshot %}Snapshots evaluate diffs on every run. Aligning the schedule with the source update frequency is critical; too frequent and you incur unnecessary comparison cost, too infrequent and detection lag grows.
History tables grow over time. If you only query the current rows, expose a view that filters dbt_valid_to IS NULL and optimize reads with partitioning or clustering.
| Warehouse / aspect | Recommended setting | Goal |
|---|---|---|
| Common (scheduling) | Once right after updates; restrict the time window if needed | Reduce wasted diff comparisons |
| Common (storage) | Partition and cluster by dbt_valid_from/valid_to | Reduce I/O for time-ordered queries |
| Common (views) | Current-row view: where dbt_valid_to is null | Keep the bulk of reads lightweight |
Example view that returns only the current rows
create or replace view analytics.v_customers_current as
select *
from snapshots.customers_snapshot
where dbt_valid_to is null;The Analytics Engineer exam frequently tests the conditions for choosing timestamp vs check, the requirement for unique_key, hard-delete handling, and the choice of comparison columns. It rewards semantically correct diff definitions and stable comparisons over operator trivia or warehouse-specific functions.
Snapshots and incremental models have different purposes. Incremental is a means to efficiently update the latest state; Snapshot is a means to preserve history. Do not conflate the two.
| Item | What to remember | Common trap |
|---|---|---|
| unique_key | Required for every snapshot | Designs where the natural key is unstable |
| timestamp | Assumes a trustworthy updated_at | Late arrivals cause missed detection |
| check | Compare only business-critical columns | Including log columns or the update timestamp |
Example tests in schema.yml (unique and not_null)
version: 2
snapshots:
- name: customers_snapshot
tests:
- not_null:
column_name: customer_id
- unique:
column_name: customer_idAnalytics Engineer
問題 1
You want to snapshot an online orders table. The app updates amount and status when the amount changes, but some backfills overwrite updated_at with a past timestamp. You want to suppress false positives without missing real changes. Which configuration is most appropriate?
正解: A
Because updated_at is unreliable, timestamp is a poor fit. The correct approach is check restricted to business-meaningful columns (amount, status), with rounding to avoid floating-point error on amount. Comparing every column amplifies noise, and mixing in the current time invites false positives.
What is the difference between Snapshot and incremental models?
Incremental models efficiently update a table to its latest state without keeping history. Snapshots close the validity window on the previous row and append a new row on each change, preserving full history. They serve different purposes.
If a row is hard-deleted from the source, does Snapshot automatically expire it?
Not by default. Enabling invalidate_hard_deletes=true makes the snapshot run detect rows that no longer exist in the source and update dbt_valid_to to expire them.
Is it safe to include computed columns in check_cols?
It is safe as long as the computation is deterministic and stable. Avoid nondeterministic functions (current time, random) and floating-point values prone to rounding errors. Preprocess with round, cast, or coalesce where needed.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...