Dimensions are the analytical "axes" and the set of attributes that give meaning to facts. In dbt, you build stable dimensions by combining models, snapshots, and tests.
This article walks through design patterns you can apply directly in production, grounded in the Analytics Engineer exam scope (modeling, snapshots, tests, documentation).
Dimensions are attribute tables such as customer, product, and date. They serve as the axes for grouping and filtering in analytics. They join to facts (transactions, events) via foreign keys to form a star schema.
In dbt, dimensions are defined as regular models and their quality is guaranteed by schema tests. When you need to retain history, combine them with snapshots (SCD2). It is critical to document the semantic definition (grain, keys, descriptions) so consumers can reference them with the correct interpretation.
| Element | Purpose | dbt Implementation Notes |
|---|---|---|
| Dimension tables (customer, product, date) | Analytical axes. Provide attributes used for grouping and filtering | Define as models. Generate surrogate keys, add not_null / unique tests, and use snapshots as needed |
| Fact tables (orders, payments, events) | The aggregation target. The source for measure calculations | Often incremental models. Configure relationship tests against dimension foreign keys |
| Bridge tables (many-to-many intermediate) | Resolve many-to-many relationships or expand hierarchies | Composite uniqueness tests on keys, explicit grain, and documented join order |
Star schema (conceptual diagram)
A simple Type 1 dimension (dim_customer.sql)
{{
config(materialized='table')
}}
with src as (
select
source_system,
customer_id,
coalesce(trim(lower(email)), '') as email,
trim(full_name) as full_name,
status,
first_order_date
from {{ source('crm', 'customers') }}
),
latest as (
-- Keep only the latest row per natural key (deduplication example)
select * from (
select s.*,
row_number() over(partition by source_system, customer_id order by _ingested_at desc) as rn
from src s
) t where rn = 1
)
select
{{ dbt_utils.surrogate_key(['source_system','customer_id']) }} as customer_sk,
source_system,
customer_id,
email,
full_name,
status,
first_order_date
from latest;A dimension's grain is the "real-world unit that each row uniquely represents." For a customer dimension, "source x customer_id" is the natural natural key, and to stabilize it you adopt a surrogate key (a hash or a serial number).
In dbt, dbt_utils.surrogate_key can generate a stable hash from multiple columns. The basic pattern is to deduplicate by the natural key and apply unique / not_null tests to the surrogate key.
Declaring keys and tests (schema.yml excerpt)
version: 2
models:
- name: dim_customer
description: Customer dimension (Type 1). Grain = source_system x customer_id.
columns:
- name: customer_sk
tests:
- not_null
- unique
- name: customer_id
tests:
- not_null
- name: email
description: Normalized email address
When you need to retain attribute history, implement SCD Type 2 using dbt snapshots. A new row is inserted when a checked column changes, and the validity period is managed via valid_from / valid_to.
In practice, splitting things into a history-retention snapshot (the raw history) and a current-value dimension (a convenience view that returns only the latest valid rows) keeps aggregations stable.
Customer SCD2 snapshot (snapshots/dim_customer_scd2.sql)
{% snapshot dim_customer_scd2 %}
{{
config(
target_schema='snapshots',
unique_key='concat(source_system, ''::'', customer_id)',
strategy='check',
check_cols=['email','full_name','status'],
invalidate_hard_deletes=True
)
}}
select
source_system,
customer_id,
lower(email) as email,
full_name,
status
from {{ source('crm', 'customers') }}
{% endsnapshot %}
-- Convenience dimension with only the latest valid rows (models/dim_customer_current.sql)
{{
config(materialized='view')
}}
select *
from {{ ref('dim_customer_scd2') }}
where dbt_valid_to is nullRole-playing is using the same dimension in multiple roles depending on context. The classic example is using a date dimension as both an order date and a ship date.
Conformed dimensions, which maintain a consistent interpretation across multiple data marts, are easiest to operate when they reference a single source model and column names are aliased per role at the view layer.
Role-playing date dimensions (views)
{{
config(materialized='view')
}}
-- models/dim_order_date.sql
select
date_sk as order_date_sk,
date_day as order_date,
year,
month,
week,
is_weekend
from {{ ref('dim_date') }};
{{
config(materialized='view')
}}
-- models/dim_ship_date.sql
select
date_sk as ship_date_sk,
date_day as ship_date,
year,
month,
week,
is_weekend
from {{ ref('dim_date') }};The date dimension is the core of analytics. You generate a continuous axis (spine) with no missing days, and add derived columns such as year, quarter, month, week, and a business-day flag. A continuous axis is essential for period aggregations and zero-fills.
Using dbt_utils.date_spine lets you generate a date series while abstracting away warehouse differences. Define the start and end dates to match your operational policy (retroactive range, whether to include future dates).
Date dimension (dim_date.sql)
{{
config(materialized='table')
}}
with spine as (
{{ dbt_utils.date_spine(
datepart='day',
start_date="to_date('2018-01-01')",
end_date='current_date'
) }}
),
cal as (
select
cast(replace(cast(date_day as string), '-', '') as bigint) as date_sk,
date_day,
extract(year from date_day) as year,
extract(quarter from date_day) as quarter,
extract(month from date_day) as month,
extract(week from date_day) as week,
case when extract(dayofweek from date_day) in (6,7) then true else false end as is_weekend
from spine
)
select * from calDimensions need to be trusted as "stable axes." Enforce key uniqueness, NULL prohibition, and referential integrity through schema tests, and document the model description, column definitions, grain, and update policy.
On exams, you are often asked where each test belongs (unique / not_null on the dimension side, relationships on the fact side). In practice, it is effective to run snapshots, models, and tests on a schedule via CI, and to surface breaking changes through docs and the catalog.
Relationship tests and documentation (schema.yml excerpt)
version: 2
models:
- name: fact_orders
description: Order fact (daily incremental)
columns:
- name: customer_sk
tests:
- relationships:
to: ref('dim_customer')
field: customer_sk
- name: product_sk
tests:
- relationships:
to: ref('dim_product')
field: product_sk
- name: order_date_sk
tests:
- relationships:
to: ref('dim_date')
field: date_sk
- name: dim_product
description: Product dimension. Grain = product_id. Price history is managed in a separate table.
columns:
- name: product_sk
description: Surrogate key
tests: [not_null, unique]
- name: product_id
description: Natural key (source-dependent)
tests: [not_null]
Analytics Engineer
問題 1
You want to analyze the history of customer status against an order fact using the "value at that point in time." Which is the most appropriate dbt design?
正解: A
Because you need the "value as of that point in time," SCD2 is appropriate. The snapshot manages the validity period, and you join to the customer row whose validity period contains the order date. Type 1 or always-denormalized approaches lose history, and a daily full refresh is inefficient in terms of cost and stability.
Are dimensions and entity tables the same thing?
They are related concepts but serve different purposes. Entities are normalized representations of operational data, while dimensions are tables optimized as analytical "axes." For dimensions, you drop columns that are not needed for analysis, normalize code values, add derived columns, introduce surrogate keys, and apply SCD patterns.
Should SCD Type 2 always be implemented with snapshots?
Snapshots are the safe and standard option in dbt. You can use the warehouse's native change data capture (CDC) directly, but if portability and a consistent table structure (valid_from/valid_to, dbt_updated_at, etc.) matter to you, snapshots are easier to work with.
How do you handle late-arriving facts?
Join the fact to a role-playing date dimension using the event date and design for recomputation (re-runs) when needed. On the dimension side, manage validity periods with SCD2 so you can join to the "attributes as of that point in time," and configure a re-ingestion window in incremental models so historical recalculations are possible.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...