In dbt, splitting models into layers by role keeps the blast radius of changes small and makes both testing and operations easier. This article focuses on the three layers — staging, intermediate, and marts — and explains how to divide responsibilities and the design principles behind the split.
From a dbt Analytics Engineer exam perspective, you need to be able to instantly answer "what is allowed and what is not allowed" in each layer, and to explain the naming conventions and the criteria for placing tests.
The main purpose of the 3-layer split is to make processing responsibilities explicit: normalization and unification of data (staging), cross-cutting alignment and pre-processing (intermediate), and facts and dimensions optimized for analytical consumption (marts). This localizes defects and improves changeability. The official documentation also notes that consistency in model granularity, responsibilities, and naming improves maintainability.
The cross-layer rules are simple. Lower layers do not depend on upper layers. Upper layers carry expressions closer to business use, but avoid duplicating aggregate or derived metric definitions. Tests are placed according to each layer's character (staging handles structure and uniqueness, intermediate handles relational integrity, marts handles business validation).
| Layer | Primary Purpose | Input/Output | Aggregation Grain |
|---|---|---|---|
| staging | Type alignment, naming unification, deduplication | Input: source() / Output: stg_* | Row grain as close as possible to the raw data |
| intermediate | Cross-cutting joins, alignment, semi-normalization | Input: stg_* / Output: int_* | Depends on use case, but generally detail grain |
| marts | Consumption-optimized (facts/dimensions, star schema) | Input: int_* / Output: dim_*, fct_* | Grain suited for analysis and visualization (often detail) |
Data flow of the 3-layer model (conceptual diagram)
Staging absorbs differences between external systems so that downstream layers can work from the "same assumptions." Allowed processing includes type conversion, naming normalization, trimming and simple normalization, deduplication, and determining record validity. Business aggregations and metrics do not belong here.
In principle, tables are provided 1:1 per source, and column names are aligned with the final names used in analysis. Source integrity is enforced via source freshness and accepted-value tests.
Example of stg_orders.sql and schema.yml
-- models/staging/app/stg_orders.sql
with src as (
select * from {{ source('app', 'orders') }}
),
renamed as (
select
cast(id as bigint) as order_id,
cast(customer_id as bigint) as customer_id,
cast(total_amount as numeric) as amount,
date_trunc('second', created_at) as created_at,
coalesce(status, 'unknown') as status
from src
),
dedup as (
select *
from (
select *, row_number() over (partition by order_id order by created_at desc) as rn
from renamed
) t where rn = 1
)
select * from dedup;
# models/staging/app/stg_orders.yml
version: 2
models:
- name: stg_orders
columns:
- name: order_id
tests:
- not_null
- unique
- name: customer_id
tests:
- not_null
- name: status
tests:
- accepted_values:
values: ['paid','cancelled','pending','unknown']
# models/staging/app/stg_customers.yml (place relationship tests in the intermediate layer)
# It is safer to use the relationships test in the int layer for customer_id referential integrity.Intermediate joins entities already shaped in staging across sources, and consolidates key alignment and pre-processing that does not lean too heavily on business specifics. Typical work includes mapping customer IDs from different sources, merging duplicate customers, and joining events with dimensions.
Avoid excessive aggregation and metric computation here, keeping reusability in the final marts layer in mind. Typical placements include unfolding effective periods starting from snapshots, stabilizing surrogate keys, and standardizing join conditions.
Marts are the final form, optimized for analysis, visualization, and delivery (exposures). Model them with a star schema (facts and dimensions): facts hold the detail of events or transactions, and dimensions hold master data and slowly changing dimensions (SCD).
Avoid duplicate definitions of derived metrics; where possible, centralize calculations against the central fact tables as views or models. Choose materialization based on workload and refresh cadence, and make the recompute boundary of incremental tables explicit.
Naming makes layer, domain, and entity explicit through prefixes and ordering, keeping reference conventions consistent. Tests vary in strictness by layer and are placed so that the impact surface of a failure is clear. Dependencies are expressed only with ref()/source(), and references from upper layers to lower layers are forbidden.
Model contracts and column descriptions are effective for catching schema drift early. Support varies by environment and dbt version, so check the documentation for your dbt Core/Cloud version when introducing them.
"What you do in which layer" comes up frequently on the exam. For example, doing a SUM aggregation in staging is wrong; remember that intermediate handles key alignment and period expansion, while marts holds the final definitions of facts/dimensions and metrics.
In production, the biggest cost driver is metric definitions duplicated across multiple places. Drive the location of definitions to a single model on the marts side, and establish a flow that detects drift with tests. Incremental models are convenient, but quality suffers if the recompute range and natural key dedup rules are vague.
Analytics Engineer
問題 1
You want to aggregate order data to build “monthly revenue.” Following dbt's 3-layer model, which placement is appropriate?
正解: C
Staging is only for absorbing types, naming, and duplicates; intermediate centers on joins and alignment and should avoid excessive aggregation. Consumption optimization and metric centralization are the responsibility of marts, so creating fct_orders at detail grain and defining the monthly aggregation on top of it (as a derived mart) is correct.
Do Databricks Medallion layers (Bronze/Silver/Gold) map to dbt's staging/intermediate/marts?
Conceptually they line up. Bronze is RAW/ingest, Silver is shaping and integration (equivalent to dbt's staging+intermediate), and Gold is consumption-optimized (marts). That said, the implementation depends on your platform, so map the layers only after clearly defining each layer's responsibilities.
Should relationships tests be written in staging or intermediate?
Referential integrity should be validated in intermediate. Staging is the 1:1 source-shaping stage, so assigning it referential responsibility blurs the layer's role. In staging, focus on not_null, unique, and accepted_values tests.
When should you adopt model contracts?
Adopt contracts once the schema is reasonably stable and you have CI/CD and tests in place to surface downstream impact. Support varies by environment and dbt version, so check the official documentation for your version's feature scope and compatibility before introducing contracts.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...