dbt Layered Architecture: Staging/Marts (2026)

dbt's layered architecture is a foundational principle for building a model graph that is resilient to change and easy to test, by separating source shaping, business logic consolidation, and analytical schemas.

This article balances the Analytics Engineer exam perspective with concrete implementation patterns that pay off in day-to-day operations, so you can draw the boundaries between staging / intermediate / marts without hesitation.

The Big Picture of Layered Architecture

dbt models have their execution order determined by a dependency graph based on ref. Clearly separating layers aligns the responsibilities of each model and makes test placement and the scope of change impact easier to reason about. In particular, adopting staging / intermediate / marts lets you decouple type casting and naming standardization, business logic consolidation, and analytics consumption optimization.

The Analytics Engineer exam asks about each layer's purpose, naming, grain, materialization choices, and the placement of tests and documentation. Keeping the following principles in mind helps you balance design consistency with execution efficiency.

One responsibility per model: do not mix source shaping, joining/aggregation, and analytical serving
Explicit grain: be able to articulate row uniqueness and the primary key at each layer
Stable API: marts focuses on being a contract surface for downstream consumers
Localized change: intermediate aggregations absorb upstream changes so they do not propagate downstream
Test focus: uniqueness/not-null upstream, business validation in intermediate and marts

The flow of dbt layered architecture

Project structure example (excerpt)

# Directory
models/
  staging/
    sources.yml
    stg_customers.sql
    stg_orders.sql
  intermediate/
    int_orders_enriched.sql
    int_customers_current.sql
  marts/
    core/
      dim_customers.sql
      fct_orders.sql

# dbt_project.yml (unify per-layer materialization policy)
models:
  my_project:
    staging:
      +materialized: view
    intermediate:
      +materialized: table
    marts:
      +materialized: table

Staging Layer: Normalizing Sources

Staging is the layer closest to raw data. Keep it to column-name normalization, type casting, and light filters or column expansion. Do not bring in business logic or aggregation. Its main purpose is to provide a "stable column shape and naming that downstream consumers can work with easily."

Views are the safe default for materialization. Consider tables only when the computation is heavy or the upstream source is slow. Defining basic tests like source uniqueness, not-null, and referential integrity here helps you catch anomalies early.

Naming convention: snake_case, prefix with stg_
Reference external sources via source() and standardize column names
Explicit type casting and unified time zone policy
Place uniqueness and not-null tests here
Default to views; only switch to tables when heavy

Example of stg_orders.sql and sources.yml

-- models/staging/stg_orders.sql
with src as (
  select * from {{ source('raw', 'orders') }}
)
select
  cast(order_id as varchar)       as order_id,
  cast(customer_id as varchar)    as customer_id,
  cast(order_ts as timestamp)     as order_timestamp,
  upper(status)                   as status,
  cast(total_amount as numeric(38,2)) as total_amount,
  updated_at
from src
where order_id is not null
;

# models/staging/sources.yml
version: 2
sources:
  - name: raw
    schema: raw
    tables:
      - name: orders
        description: Raw orders table
        loaded_at_field: updated_at
        freshness:
          warn_after: {count: 90, period: minute}
          error_after: {count: 180, period: minute}
        columns:
          - name: order_id
            tests: [not_null, unique]
          - name: customer_id
            tests: [not_null]
          - name: updated_at
            tests: [not_null]

Intermediate Layer: Consolidating Business Logic

Intermediate is the layer that joins multiple staging models and consolidates business logic such as deduplication, grain unification, and derived columns. Sitting just before marts, it functions as a buffer that absorbs the impact of changes.

For frequently updated tables, incremental materialization is effective. With an is_incremental() guard, you only ingest new and updated rows, shortening execution time. For deduplication, selecting the latest version via window functions is the stable approach.

Join multiple sources and lock in the grain
Deduplicate and pick the latest record
Add derived metrics and flags (e.g., shipping-inclusive amount, cancellation flag)
Use incremental to keep compute costs down
Provide an interface that easily absorbs schema changes on the marts side

int_orders_enriched.sql (incremental and deduplication)

-- models/intermediate/int_orders_enriched.sql
{{
  config(
    materialized='incremental',
    unique_key='order_id'
  )
}}

with orders as (
  select * from {{ ref('stg_orders') }}
),
-- Reference the latest customer attributes from another model (e.g., int_customers_current)
customers as (
  select * from {{ ref('int_customers_current') }}
),
-- Take the latest version of duplicate orders
latest_orders as (
  select o.*
  from (
    select *,
           row_number() over(partition by order_id order by updated_at desc) as rn
    from orders
  ) o
  where o.rn = 1
),
joined as (
  select
    l.order_id,
    l.customer_id,
    c.customer_tier,
    l.order_timestamp,
    l.status,
    l.total_amount,
    case when l.status in ('CANCELLED','VOID') then 1 else 0 end as is_cancelled
  from latest_orders l
  left join customers c using (customer_id)
)
select *
from joined
{% if is_incremental() %}
  -- Ingest only recent updates (assumes updated_at exists)
  where joined.order_timestamp > (
    select coalesce(max(order_timestamp), '1900-01-01') from {{ this }}
  )
{% endif %}
;

Marts Layer: Providing the Analytics Schema

Marts is the layer that downstream dashboards and analytical queries reference directly. Split schemas by business domain (e.g., core, marketing, finance) and design dimensions (dim_) and facts (fct_) following the star schema principle. Prioritize schema stability and readability above all.

Facts have a clear grain (e.g., one row = one order), and dimensions provide the current form of attributes. To avoid double counting in aggregation, keep row-level facts in the fact tables as much as possible and clearly separate the roles between dbt and dashboard-side aggregation.

Naming convention: dim_*, fct_*
Marts plays an API-like role. Lock columns and types as a contract
Facts provide measurements, dimensions provide descriptive attributes
Split into sub-folders by domain (core, marketing, etc.)
Visualize downstream assets and track dependencies via exposures

dim_customers / fct_orders (excerpt)

-- models/marts/core/dim_customers.sql
select
  customer_id,
  customer_name,
  customer_tier,
  signup_date
from {{ ref('int_customers_current') }}
;

-- models/marts/core/fct_orders.sql
select
  order_id,
  customer_id,
  order_timestamp,
  status,
  total_amount,
  is_cancelled
from {{ ref('int_orders_enriched') }}
where status not in ('TEST')
;

Placement of Tests, Contracts, and Documentation

dbt tests come in two flavors: generic (not_null, unique, relationships, etc.) and singular (arbitrary SQL). Guarantee key integrity at staging, and validate business rules between intermediate and marts. Data contracts (model contracts) lock columns, types, and required flags to prevent downstream breakage.

Build documentation with descriptions and docs blocks, and make it browsable via docs generate. Defining exposures lets you visualize the dependencies between dashboards or reports and your models.

staging: enforce not_null, unique, and relationships
intermediate: verify dedup and filter consistency with singular tests
marts: lock columns and types with contracts (contract: true)
Make actual usage and ownership clear with docs and exposures
On failures, work backward from upstream to isolate the cause

schema.yml (example with tests, contracts, exposures)

version: 2
models:
  - name: fct_orders
    description: Orders fact. One row = one order
    config:
      contract: true
    columns:
      - name: order_id
        description: Order ID (primary key)
        data_type: varchar
        tests: [not_null, unique]
      - name: customer_id
        data_type: varchar
        tests:
          - not_null
          - relationships:
              to: ref('dim_customers')
              field: customer_id
      - name: total_amount
        data_type: numeric
        tests:
          - not_null
          - accepted_values:
              values:
                - "{{ 0 }}:inf"  # Non-negative can be enforced via a custom test

exposures:
  - name: sales_dashboard
    type: dashboard
    maturity: high
    url: https://bi.example.com/dashboards/sales
    owners:
      - name: Analytics Team
        email: [email protected]
    depends_on:
      - ref('fct_orders')
      - ref('dim_customers')

Execution Strategy and Performance Optimization

dbt executes in parallel based on a DAG. Using selectors (--select, --state) re-runs only the scope affected by changes, shortening the dev-to-prod cycle. Decide a default materialization per layer to avoid unnecessary recomputation.

For incremental builds, clarify how primary keys and update timestamps are handled, and share full-refresh operational guidelines with the team. Warehouse-specific optimizations like clustering and partitioning are safer when managed explicitly via dbt model configuration.

Minimal execution with --select + standardized tags
Differential builds via state:modified
Schedule full refreshes deliberately (night-time or low-load windows)
Lock per-layer default materialization in dbt_project.yml
Split folders by domain to increase parallelism

Layer	Main purpose	Grain	Naming convention
staging	Source shaping, type casting, naming standardization	Same as source (primary key = natural key)	stg_*
intermediate	Joins, deduplication, business logic consolidation	Row grain at the business level	int_*
marts	Stable API for downstream (analytics schema)	dim: current entity / fct: one fact per row	dim_* / fct_*

Execution commands and selector examples

# Only changed models and their downstream
$ dbt build --select state:modified+ --state target/previous_run

# Per-layer execution (folder selector)
$ dbt build --select models/staging
$ dbt build --select models/intermediate
$ dbt build --select models/marts

# Re-run only a specific domain
$ dbt build --select marts.core+

# Full refresh (planned)
$ dbt build --full-refresh --select int_orders_enriched fct_orders

Check Your Understanding

Analytics Engineer

問題 1

Which of the following is the most appropriate division of responsibilities among dbt's staging / intermediate / marts layers?

A. staging handles type casting and naming standardization, intermediate handles joins and deduplication, and marts provides a stable analytics schema
B. staging aggregates facts, intermediate ingests sources, and marts normalizes column names
C. staging serves dashboards, intermediate forbids table creation, and marts uses views only
D. staging and intermediate are identical, and only marts needs to be separated

正解: A

staging handles minimal shaping (types and naming), intermediate consolidates business logic (joins, deduplication, grain), and marts provides a stable schema that downstream queries reference directly. This matches dbt's recommended division of responsibilities.

Frequently Asked Questions

Is it okay to perform aggregations or joins in the staging layer?

Generally, avoid it. Keep staging limited to type casting and naming standardization, and consolidate joins, deduplication, and derived metrics in the intermediate layer. This achieves separation of concerns and localizes the impact of changes.

Should I materialize as views or tables?

The standard defaults are staging=view, intermediate=table (or incremental for frequently updated tables), and marts=table — this combination works well in practice. Even if you make exceptions based on cost, execution time, or source size, locking in per-layer defaults in dbt_project.yml keeps operations stable.

Where should I handle dimension history (SCD)?

Manage change history itself with dbt snapshots (a separate feature), shape it into the current form in the intermediate layer, and finish with a consumer-friendly column layout in marts dim_ tables. If you expose history tables directly in marts, use clear naming and descriptions to prevent misuse by downstream consumers.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

dbt Layered Architecture: How to Correctly Split staging / intermediate / marts