dbt

Roles and Responsibility Split of staging / intermediate / marts in dbt

2026-04-19
NicheeLab Editorial Team

In dbt, splitting models into layers by role keeps the blast radius of changes small and makes both testing and operations easier. This article focuses on the three layers — staging, intermediate, and marts — and explains how to divide responsibilities and the design principles behind the split.

From a dbt Analytics Engineer exam perspective, you need to be able to instantly answer "what is allowed and what is not allowed" in each layer, and to explain the naming conventions and the criteria for placing tests.

Overview of the 3-Layer Architecture and Responsibility Split

The main purpose of the 3-layer split is to make processing responsibilities explicit: normalization and unification of data (staging), cross-cutting alignment and pre-processing (intermediate), and facts and dimensions optimized for analytical consumption (marts). This localizes defects and improves changeability. The official documentation also notes that consistency in model granularity, responsibilities, and naming improves maintainability.

The cross-layer rules are simple. Lower layers do not depend on upper layers. Upper layers carry expressions closer to business use, but avoid duplicating aggregate or derived metric definitions. Tests are placed according to each layer's character (staging handles structure and uniqueness, intermediate handles relational integrity, marts handles business validation).

  • staging absorbs differences in external sources by resolving types, naming, missing values, and duplicates
  • intermediate is responsible for cross-cutting alignment, joins, and logic standardization (avoid excessive aggregation)
  • marts are optimized for use cases as facts and dimensions
  • Each layer's naming is made explicit with prefixes (e.g. stg_, int_, dim_/fct_)
  • Tests vary in type and strictness based on the layer's purpose
LayerPrimary PurposeInput/OutputAggregation Grain
stagingType alignment, naming unification, deduplicationInput: source() / Output: stg_*Row grain as close as possible to the raw data
intermediateCross-cutting joins, alignment, semi-normalizationInput: stg_* / Output: int_*Depends on use case, but generally detail grain
martsConsumption-optimized (facts/dimensions, star schema)Input: int_* / Output: dim_*, fct_*Grain suited for analysis and visualization (often detail)

Data flow of the 3-layer model (conceptual diagram)

checkjoinstarRAWExternal sourcessources()freshness / checkstg_* (shaping)schema / type alignmentint_* (alignment/integration)uniqueness/missing testsdim_*, fct_*star schema (marts)BI / ML / dashboardsexposuresData flow of the 3-layer model (conceptual diagram)

Staging Responsibilities: Focus on Absorbing Types, Naming, and Duplicates

Staging absorbs differences between external systems so that downstream layers can work from the "same assumptions." Allowed processing includes type conversion, naming normalization, trimming and simple normalization, deduplication, and determining record validity. Business aggregations and metrics do not belong here.

In principle, tables are provided 1:1 per source, and column names are aligned with the final names used in analysis. Source integrity is enforced via source freshness and accepted-value tests.

  • Naming: stg_<source>_<entity> (e.g. stg_app_orders)
  • What not to do: aggregations (SUM, COUNT), complex window metrics, embedded business logic
  • What to do: type alignment, column name normalization, deduplication, trim/lowercase, simple invalid-record filtering
  • Tests: apply not_null/unique/accepted_values on columns. Set freshness in sources.yml
  • Unify foreign keys to numeric or string types to make relationship tests easier in the intermediate layer

Example of stg_orders.sql and schema.yml

-- models/staging/app/stg_orders.sql
with src as (
  select * from {{ source('app', 'orders') }}
),
renamed as (
  select
    cast(id as bigint)            as order_id,
    cast(customer_id as bigint)   as customer_id,
    cast(total_amount as numeric) as amount,
    date_trunc('second', created_at) as created_at,
    coalesce(status, 'unknown')   as status
  from src
),
dedup as (
  select *
  from (
    select *, row_number() over (partition by order_id order by created_at desc) as rn
    from renamed
  ) t where rn = 1
)
select * from dedup;

# models/staging/app/stg_orders.yml
version: 2
models:
  - name: stg_orders
    columns:
      - name: order_id
        tests:
          - not_null
          - unique
      - name: customer_id
        tests:
          - not_null
      - name: status
        tests:
          - accepted_values:
              values: ['paid','cancelled','pending','unknown']

# models/staging/app/stg_customers.yml (place relationship tests in the intermediate layer)
# It is safer to use the relationships test in the int layer for customer_id referential integrity.

Intermediate Responsibilities: Aggregating Cross-Cutting Joins and Alignment Logic

Intermediate joins entities already shaped in staging across sources, and consolidates key alignment and pre-processing that does not lean too heavily on business specifics. Typical work includes mapping customer IDs from different sources, merging duplicate customers, and joining events with dimensions.

Avoid excessive aggregation and metric computation here, keeping reusability in the final marts layer in mind. Typical placements include unfolding effective periods starting from snapshots, stabilizing surrogate keys, and standardizing join conditions.

  • Responsibilities: joins, forward/reverse mapping, period expansion, business key normalization
  • What not to do: department-specific KPI calculations, visualization-driven pivot fixation
  • Tests: relationships (referential), composite uniqueness (e.g. order_id, line_number)
  • Incremental updates: use incremental for large targets, but condition design that preserves key integrity is essential
  • File layout: models/intermediate/<domain>/int_<domain>_<entity>.sql

Marts Responsibilities: Consumption Optimization and Star Schema

Marts are the final form, optimized for analysis, visualization, and delivery (exposures). Model them with a star schema (facts and dimensions): facts hold the detail of events or transactions, and dimensions hold master data and slowly changing dimensions (SCD).

Avoid duplicate definitions of derived metrics; where possible, centralize calculations against the central fact tables as views or models. Choose materialization based on workload and refresh cadence, and make the recompute boundary of incremental tables explicit.

  • Naming: fct_<domain>_<event>, dim_<domain>_<subject>
  • Facts hold foreign keys and measures; dimensions hold attributes and surrogate keys
  • Tests: foreign key integrity, duplicate prevention, business rule validation (e.g. amount >= 0)
  • Exposures: track BI dashboards and ML pipelines with exposures
  • To improve reusability, separate department-specific transformations into derived marts

Naming Conventions, Test Strategy, and Dependency Management

Naming makes layer, domain, and entity explicit through prefixes and ordering, keeping reference conventions consistent. Tests vary in strictness by layer and are placed so that the impact surface of a failure is clear. Dependencies are expressed only with ref()/source(), and references from upper layers to lower layers are forbidden.

Model contracts and column descriptions are effective for catching schema drift early. Support varies by environment and dbt version, so check the documentation for your dbt Core/Cloud version when introducing them.

  • Naming examples: stg_app_orders, int_sales_order_items, dim_sales_customer, fct_sales_orders
  • Test placement: structure and uniqueness in staging, relational integrity in intermediate, business validation in marts
  • Express dependencies only with ref()/source(); avoid direct schema references
  • Lock column specs with contracts and docs to surface breaking changes
  • Derive snapshots into dim_* to make the responsibility of historical integrity explicit

Exam Tips and Production Pitfalls

"What you do in which layer" comes up frequently on the exam. For example, doing a SUM aggregation in staging is wrong; remember that intermediate handles key alignment and period expansion, while marts holds the final definitions of facts/dimensions and metrics.

In production, the biggest cost driver is metric definitions duplicated across multiple places. Drive the location of definitions to a single model on the marts side, and establish a flow that detects drift with tests. Incremental models are convenient, but quality suffers if the recompute range and natural key dedup rules are vague.

  • Antipatterns: computing department KPIs in staging; first introducing type conversion of raw data in marts
  • Good patterns: referential integrity via relationships tests; aggregations centralized in fct_*
  • On changes, read impact from lower to upper layers and verify test chains in CI
  • Set freshness on sources to make delay-cause isolation easier
  • Document metric expressions and lock down naming and schema

Check Your Understanding

Analytics Engineer

問題 1

You want to aggregate order data to build “monthly revenue.” Following dbt's 3-layer model, which placement is appropriate?

  1. Aggregate to monthly in staging and connect directly to BI
  2. Join staging in intermediate and aggregate to monthly there
  3. Create a fact table (fct_orders) at detail grain in marts, and define the monthly aggregation in a separate marts model
  4. Any layer is fine; staging is preferable because results appear fastest

正解: C

Staging is only for absorbing types, naming, and duplicates; intermediate centers on joins and alignment and should avoid excessive aggregation. Consumption optimization and metric centralization are the responsibility of marts, so creating fct_orders at detail grain and defining the monthly aggregation on top of it (as a derived mart) is correct.

Frequently Asked Questions

Do Databricks Medallion layers (Bronze/Silver/Gold) map to dbt's staging/intermediate/marts?

Conceptually they line up. Bronze is RAW/ingest, Silver is shaping and integration (equivalent to dbt's staging+intermediate), and Gold is consumption-optimized (marts). That said, the implementation depends on your platform, so map the layers only after clearly defining each layer's responsibilities.

Should relationships tests be written in staging or intermediate?

Referential integrity should be validated in intermediate. Staging is the 1:1 source-shaping stage, so assigning it referential responsibility blurs the layer's role. In staging, focus on not_null, unique, and accepted_values tests.

When should you adopt model contracts?

Adopt contracts once the schema is reasonably stable and you have CI/CD and tests in place to surface downstream impact. Support varies by environment and dbt version, so check the official documentation for your version's feature scope and compatibility before introducing contracts.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.