dbt's layered architecture is a foundational principle for building a model graph that is resilient to change and easy to test, by separating source shaping, business logic consolidation, and analytical schemas.
This article balances the Analytics Engineer exam perspective with concrete implementation patterns that pay off in day-to-day operations, so you can draw the boundaries between staging / intermediate / marts without hesitation.
dbt models have their execution order determined by a dependency graph based on ref. Clearly separating layers aligns the responsibilities of each model and makes test placement and the scope of change impact easier to reason about. In particular, adopting staging / intermediate / marts lets you decouple type casting and naming standardization, business logic consolidation, and analytics consumption optimization.
The Analytics Engineer exam asks about each layer's purpose, naming, grain, materialization choices, and the placement of tests and documentation. Keeping the following principles in mind helps you balance design consistency with execution efficiency.
The flow of dbt layered architecture
Project structure example (excerpt)
# Directory
models/
staging/
sources.yml
stg_customers.sql
stg_orders.sql
intermediate/
int_orders_enriched.sql
int_customers_current.sql
marts/
core/
dim_customers.sql
fct_orders.sql
# dbt_project.yml (unify per-layer materialization policy)
models:
my_project:
staging:
+materialized: view
intermediate:
+materialized: table
marts:
+materialized: tableStaging is the layer closest to raw data. Keep it to column-name normalization, type casting, and light filters or column expansion. Do not bring in business logic or aggregation. Its main purpose is to provide a "stable column shape and naming that downstream consumers can work with easily."
Views are the safe default for materialization. Consider tables only when the computation is heavy or the upstream source is slow. Defining basic tests like source uniqueness, not-null, and referential integrity here helps you catch anomalies early.
Example of stg_orders.sql and sources.yml
-- models/staging/stg_orders.sql
with src as (
select * from {{ source('raw', 'orders') }}
)
select
cast(order_id as varchar) as order_id,
cast(customer_id as varchar) as customer_id,
cast(order_ts as timestamp) as order_timestamp,
upper(status) as status,
cast(total_amount as numeric(38,2)) as total_amount,
updated_at
from src
where order_id is not null
;
# models/staging/sources.yml
version: 2
sources:
- name: raw
schema: raw
tables:
- name: orders
description: Raw orders table
loaded_at_field: updated_at
freshness:
warn_after: {count: 90, period: minute}
error_after: {count: 180, period: minute}
columns:
- name: order_id
tests: [not_null, unique]
- name: customer_id
tests: [not_null]
- name: updated_at
tests: [not_null]Intermediate is the layer that joins multiple staging models and consolidates business logic such as deduplication, grain unification, and derived columns. Sitting just before marts, it functions as a buffer that absorbs the impact of changes.
For frequently updated tables, incremental materialization is effective. With an is_incremental() guard, you only ingest new and updated rows, shortening execution time. For deduplication, selecting the latest version via window functions is the stable approach.
int_orders_enriched.sql (incremental and deduplication)
-- models/intermediate/int_orders_enriched.sql
{{
config(
materialized='incremental',
unique_key='order_id'
)
}}
with orders as (
select * from {{ ref('stg_orders') }}
),
-- Reference the latest customer attributes from another model (e.g., int_customers_current)
customers as (
select * from {{ ref('int_customers_current') }}
),
-- Take the latest version of duplicate orders
latest_orders as (
select o.*
from (
select *,
row_number() over(partition by order_id order by updated_at desc) as rn
from orders
) o
where o.rn = 1
),
joined as (
select
l.order_id,
l.customer_id,
c.customer_tier,
l.order_timestamp,
l.status,
l.total_amount,
case when l.status in ('CANCELLED','VOID') then 1 else 0 end as is_cancelled
from latest_orders l
left join customers c using (customer_id)
)
select *
from joined
{% if is_incremental() %}
-- Ingest only recent updates (assumes updated_at exists)
where joined.order_timestamp > (
select coalesce(max(order_timestamp), '1900-01-01') from {{ this }}
)
{% endif %}
;Marts is the layer that downstream dashboards and analytical queries reference directly. Split schemas by business domain (e.g., core, marketing, finance) and design dimensions (dim_) and facts (fct_) following the star schema principle. Prioritize schema stability and readability above all.
Facts have a clear grain (e.g., one row = one order), and dimensions provide the current form of attributes. To avoid double counting in aggregation, keep row-level facts in the fact tables as much as possible and clearly separate the roles between dbt and dashboard-side aggregation.
dim_customers / fct_orders (excerpt)
-- models/marts/core/dim_customers.sql
select
customer_id,
customer_name,
customer_tier,
signup_date
from {{ ref('int_customers_current') }}
;
-- models/marts/core/fct_orders.sql
select
order_id,
customer_id,
order_timestamp,
status,
total_amount,
is_cancelled
from {{ ref('int_orders_enriched') }}
where status not in ('TEST')
;dbt tests come in two flavors: generic (not_null, unique, relationships, etc.) and singular (arbitrary SQL). Guarantee key integrity at staging, and validate business rules between intermediate and marts. Data contracts (model contracts) lock columns, types, and required flags to prevent downstream breakage.
Build documentation with descriptions and docs blocks, and make it browsable via docs generate. Defining exposures lets you visualize the dependencies between dashboards or reports and your models.
schema.yml (example with tests, contracts, exposures)
version: 2
models:
- name: fct_orders
description: Orders fact. One row = one order
config:
contract: true
columns:
- name: order_id
description: Order ID (primary key)
data_type: varchar
tests: [not_null, unique]
- name: customer_id
data_type: varchar
tests:
- not_null
- relationships:
to: ref('dim_customers')
field: customer_id
- name: total_amount
data_type: numeric
tests:
- not_null
- accepted_values:
values:
- "{{ 0 }}:inf" # Non-negative can be enforced via a custom test
exposures:
- name: sales_dashboard
type: dashboard
maturity: high
url: https://bi.example.com/dashboards/sales
owners:
- name: Analytics Team
email: [email protected]
depends_on:
- ref('fct_orders')
- ref('dim_customers')dbt executes in parallel based on a DAG. Using selectors (--select, --state) re-runs only the scope affected by changes, shortening the dev-to-prod cycle. Decide a default materialization per layer to avoid unnecessary recomputation.
For incremental builds, clarify how primary keys and update timestamps are handled, and share full-refresh operational guidelines with the team. Warehouse-specific optimizations like clustering and partitioning are safer when managed explicitly via dbt model configuration.
| Layer | Main purpose | Grain | Naming convention |
|---|---|---|---|
| staging | Source shaping, type casting, naming standardization | Same as source (primary key = natural key) | stg_* |
| intermediate | Joins, deduplication, business logic consolidation | Row grain at the business level | int_* |
| marts | Stable API for downstream (analytics schema) | dim: current entity / fct: one fact per row | dim_* / fct_* |
Execution commands and selector examples
# Only changed models and their downstream
$ dbt build --select state:modified+ --state target/previous_run
# Per-layer execution (folder selector)
$ dbt build --select models/staging
$ dbt build --select models/intermediate
$ dbt build --select models/marts
# Re-run only a specific domain
$ dbt build --select marts.core+
# Full refresh (planned)
$ dbt build --full-refresh --select int_orders_enriched fct_ordersAnalytics Engineer
問題 1
Which of the following is the most appropriate division of responsibilities among dbt's staging / intermediate / marts layers?
正解: A
staging handles minimal shaping (types and naming), intermediate consolidates business logic (joins, deduplication, grain), and marts provides a stable schema that downstream queries reference directly. This matches dbt's recommended division of responsibilities.
Is it okay to perform aggregations or joins in the staging layer?
Generally, avoid it. Keep staging limited to type casting and naming standardization, and consolidate joins, deduplication, and derived metrics in the intermediate layer. This achieves separation of concerns and localizes the impact of changes.
Should I materialize as views or tables?
The standard defaults are staging=view, intermediate=table (or incremental for frequently updated tables), and marts=table — this combination works well in practice. Even if you make exceptions based on cost, execution time, or source size, locking in per-layer defaults in dbt_project.yml keeps operations stable.
Where should I handle dimension history (SCD)?
Manage change history itself with dbt snapshots (a separate feature), shape it into the current form in the intermediate layer, and finish with a consumer-friendly column layout in marts dim_ tables. If you expose history tables directly in marts, use clear naming and descriptions to prevent misuse by downstream consumers.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...