dbt

NicheeLab: dbt Dimensions Fundamentals and Analytical Axis Design

2026-04-19
NicheeLab Editorial Team

Dimensions are the analytical "axes" and the set of attributes that give meaning to facts. In dbt, you build stable dimensions by combining models, snapshots, and tests.

This article walks through design patterns you can apply directly in production, grounded in the Analytics Engineer exam scope (modeling, snapshots, tests, documentation).

Dimension Fundamentals and Their Role in dbt

Dimensions are attribute tables such as customer, product, and date. They serve as the axes for grouping and filtering in analytics. They join to facts (transactions, events) via foreign keys to form a star schema.

In dbt, dimensions are defined as regular models and their quality is guaranteed by schema tests. When you need to retain history, combine them with snapshots (SCD2). It is critical to document the semantic definition (grain, keys, descriptions) so consumers can reference them with the correct interpretation.

  • Dimensions require a stable grain and a unique key
  • Clarify whether it is Type 1 (overwrite) or Type 2 (history-preserving)
  • Configure relationship tests for foreign keys (fact → dim)
  • Document business definitions and usage examples
ElementPurposedbt Implementation Notes
Dimension tables (customer, product, date)Analytical axes. Provide attributes used for grouping and filteringDefine as models. Generate surrogate keys, add not_null / unique tests, and use snapshots as needed
Fact tables (orders, payments, events)The aggregation target. The source for measure calculationsOften incremental models. Configure relationship tests against dimension foreign keys
Bridge tables (many-to-many intermediate)Resolve many-to-many relationships or expand hierarchiesComposite uniqueness tests on keys, explicit grain, and documented join order

Star schema (conceptual diagram)

dim_customercustomer_sk (PK)dim_productproduct_sk (PK)dim_datedate_sk (PK)fact_ordersorder_id (PK) / customer_sk, product_sk, order_date_sk (FK)A star schema where three dimension tables join to fact_orders via surrogate keys

A simple Type 1 dimension (dim_customer.sql)

{{
  config(materialized='table')
}}

with src as (
  select
    source_system,
    customer_id,
    coalesce(trim(lower(email)), '') as email,
    trim(full_name) as full_name,
    status,
    first_order_date
  from {{ source('crm', 'customers') }}
),
latest as (
  -- Keep only the latest row per natural key (deduplication example)
  select * from (
    select s.*,
           row_number() over(partition by source_system, customer_id order by _ingested_at desc) as rn
    from src s
  ) t where rn = 1
)
select
  {{ dbt_utils.surrogate_key(['source_system','customer_id']) }} as customer_sk,
  source_system,
  customer_id,
  email,
  full_name,
  status,
  first_order_date
from latest;

Grain and Key Design: Natural Keys and Surrogate Keys

A dimension's grain is the "real-world unit that each row uniquely represents." For a customer dimension, "source x customer_id" is the natural natural key, and to stabilize it you adopt a surrogate key (a hash or a serial number).

In dbt, dbt_utils.surrogate_key can generate a stable hash from multiple columns. The basic pattern is to deduplicate by the natural key and apply unique / not_null tests to the surrogate key.

  • Natural keys can change due to operational system constraints
  • Surrogate keys improve join stability and are resilient to metadata migrations
  • Bundle the tests together: not_null, unique, and (when needed) relationship tests

Declaring keys and tests (schema.yml excerpt)

version: 2

models:
  - name: dim_customer
    description: Customer dimension (Type 1). Grain = source_system x customer_id.
    columns:
      - name: customer_sk
        tests:
          - not_null
          - unique
      - name: customer_id
        tests:
          - not_null
      - name: email
        description: Normalized email address

Handling SCD: Managing Type 2 History with dbt Snapshots

When you need to retain attribute history, implement SCD Type 2 using dbt snapshots. A new row is inserted when a checked column changes, and the validity period is managed via valid_from / valid_to.

In practice, splitting things into a history-retention snapshot (the raw history) and a current-value dimension (a convenience view that returns only the latest valid rows) keeps aggregations stable.

  • strategy='check' is well suited for detecting changes across multiple columns
  • unique_key should be the natural key (e.g., source_system + customer_id)
  • invalidate_hard_deletes turns hard deletes into history as well
  • Provide a latest-row view as a separate model to keep joins concise

Customer SCD2 snapshot (snapshots/dim_customer_scd2.sql)

{% snapshot dim_customer_scd2 %}

{{
  config(
    target_schema='snapshots',
    unique_key='concat(source_system, ''::'', customer_id)',
    strategy='check',
    check_cols=['email','full_name','status'],
    invalidate_hard_deletes=True
  )
}}

select
  source_system,
  customer_id,
  lower(email) as email,
  full_name,
  status
from {{ source('crm', 'customers') }}

{% endsnapshot %}

-- Convenience dimension with only the latest valid rows (models/dim_customer_current.sql)
{{
  config(materialized='view')
}}
select *
from {{ ref('dim_customer_scd2') }}
where dbt_valid_to is null

Role-Playing and Conformed Dimensions

Role-playing is using the same dimension in multiple roles depending on context. The classic example is using a date dimension as both an order date and a ship date.

Conformed dimensions, which maintain a consistent interpretation across multiple data marts, are easiest to operate when they reference a single source model and column names are aliased per role at the view layer.

  • Consolidate to one base dimension and contextualize names through role-specific views
  • Concentrate tests on the base model and keep views lightweight
  • Centralize business definitions in the documentation for the base model

Role-playing date dimensions (views)

{{
  config(materialized='view')
}}
-- models/dim_order_date.sql
select
  date_sk   as order_date_sk,
  date_day  as order_date,
  year,
  month,
  week,
  is_weekend
from {{ ref('dim_date') }};

{{
  config(materialized='view')
}}
-- models/dim_ship_date.sql
select
  date_sk   as ship_date_sk,
  date_day  as ship_date,
  year,
  month,
  week,
  is_weekend
from {{ ref('dim_date') }};

Designing the Date Dimension: date_spine and Time Hierarchies

The date dimension is the core of analytics. You generate a continuous axis (spine) with no missing days, and add derived columns such as year, quarter, month, week, and a business-day flag. A continuous axis is essential for period aggregations and zero-fills.

Using dbt_utils.date_spine lets you generate a date series while abstracting away warehouse differences. Define the start and end dates to match your operational policy (retroactive range, whether to include future dates).

  • Date keys are commonly an integer (YYYYMMDD) or a surrogate key
  • Do not forget to fix the time zone and document it
  • Managing the business-day calendar in a separate master table makes future extensions easier

Date dimension (dim_date.sql)

{{
  config(materialized='table')
}}
with spine as (
  {{ dbt_utils.date_spine(
      datepart='day',
      start_date="to_date('2018-01-01')",
      end_date='current_date'
  ) }}
),
cal as (
  select
    cast(replace(cast(date_day as string), '-', '') as bigint) as date_sk,
    date_day,
    extract(year from date_day) as year,
    extract(quarter from date_day) as quarter,
    extract(month from date_day) as month,
    extract(week from date_day) as week,
    case when extract(dayofweek from date_day) in (6,7) then true else false end as is_weekend
  from spine
)
select * from cal

Tests and Documentation: The Minimum Set That Supports Quality

Dimensions need to be trusted as "stable axes." Enforce key uniqueness, NULL prohibition, and referential integrity through schema tests, and document the model description, column definitions, grain, and update policy.

On exams, you are often asked where each test belongs (unique / not_null on the dimension side, relationships on the fact side). In practice, it is effective to run snapshots, models, and tests on a schedule via CI, and to surface breaking changes through docs and the catalog.

  • dim: unique / not_null on the surrogate key
  • fact: relationships on the foreign key (referencing the dim's key)
  • docs: clearly state grain, keys, update frequency, SCD policy, and usage examples

Relationship tests and documentation (schema.yml excerpt)

version: 2

models:
  - name: fact_orders
    description: Order fact (daily incremental)
    columns:
      - name: customer_sk
        tests:
          - relationships:
              to: ref('dim_customer')
              field: customer_sk
      - name: product_sk
        tests:
          - relationships:
              to: ref('dim_product')
              field: product_sk
      - name: order_date_sk
        tests:
          - relationships:
              to: ref('dim_date')
              field: date_sk

  - name: dim_product
    description: Product dimension. Grain = product_id. Price history is managed in a separate table.
    columns:
      - name: product_sk
        description: Surrogate key
        tests: [not_null, unique]
      - name: product_id
        description: Natural key (source-dependent)
        tests: [not_null]

Check Your Understanding

Analytics Engineer

問題 1

You want to analyze the history of customer status against an order fact using the "value at that point in time." Which is the most appropriate dbt design?

  1. Manage the customer dimension as a snapshot (SCD2) and join to the valid rows using the order date from fact_orders
  2. Always overwrite the customer dimension (Type 1) and join on the latest status
  3. Denormalize all customer attributes into fact_orders and store them there
  4. Rebuild the customer dimension via full refresh per order date

正解: A

Because you need the "value as of that point in time," SCD2 is appropriate. The snapshot manages the validity period, and you join to the customer row whose validity period contains the order date. Type 1 or always-denormalized approaches lose history, and a daily full refresh is inefficient in terms of cost and stability.

Frequently Asked Questions

Are dimensions and entity tables the same thing?

They are related concepts but serve different purposes. Entities are normalized representations of operational data, while dimensions are tables optimized as analytical "axes." For dimensions, you drop columns that are not needed for analysis, normalize code values, add derived columns, introduce surrogate keys, and apply SCD patterns.

Should SCD Type 2 always be implemented with snapshots?

Snapshots are the safe and standard option in dbt. You can use the warehouse's native change data capture (CDC) directly, but if portability and a consistent table structure (valid_from/valid_to, dbt_updated_at, etc.) matter to you, snapshots are easier to work with.

How do you handle late-arriving facts?

Join the fact to a role-playing date dimension using the event date and design for recomputation (re-runs) when needed. On the dimension side, manage validity periods with SCD2 so you can join to the "attributes as of that point in time," and configure a re-ingestion window in incremental models so historical recalculations are possible.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.