In dbt, folder layout and naming conventions translate directly into developer experience and operational cost. The layout of models, seeds, and macros in particular has a direct impact on DAG visibility, review-ability, and CI selectivity.
This article presents a layout you can run with minimal surprise, grounded in stable features and the official documentation. Exam-prep takeaways are woven in at the end.
dbt follows a "file placement = resource definition" philosophy. Models turn SQL into database objects, seeds turn CSVs into tables, and macros let you reuse logic via Jinja. When placement is ambiguous, DAG boundaries blur and reviews and impact analysis become difficult.
Exams also frequently ask about the staging → intermediate → marts split, the proper use of seeds, and the role and scope of macros. Lock in roles and granularity first, then apply consistent schemas and tags through dbt_project.yml.
Following the principle of least surprise, split models in two tiers (layer then domain), organize seeds by purpose, and separate macros into general utilities and adapter dispatch. Declare schema and materialization per layer in dbt_project.yml, and tie them to CI selectors and tags.
The tree below is a general structure that scales from small to mid-sized projects. It uses three layers (stg/int/marts), groups seeds by function (e.g. mappings), and separates macros into util and tests.
| Aspect | models | seeds | macros |
|---|---|---|---|
| Primary purpose | Create views/tables and other models from SQL | Materialize small reference data from CSV into tables | Reuse and abstract Jinja logic |
| Definition location | models/ | seeds/ | macros/ |
| DB artifact | view/table/incremental/ephemeral | Usually a table | None (expanded at compile time) |
| Typical command | dbt run / build | dbt seed | dbt run-operation (optional) |
| Test applicability | Both generic and singular tests | Generic tests (mainly PK / referential integrity) | Not applicable |
| Pitfalls | Layer mixing inflates the DAG and creates cycles | Large CSVs, untyped columns, insufficient privileges | Overusing macros that introduce side effects |
A standard dbt project structure and DAG layering
project/
dbt_project.yml
packages.yml
models/
staging/
src__app/
stg_app__users.sql
stg_app__orders.sql
intermediate/
int__orders_enriched.sql
marts/
sales/
dim_customer.sql
fct_order.sql
seeds/
mappings/
country_codes.csv
macros/
util/
surrogate_key.sql
cast_to_boolean.sql
tests/
relationships_if.sql
snapshots/
app/
orders_snapshot.sql
tests/
generic/
not_null_if.sql
DAG (conceptual):
[raw.sources] → [staging: stg_*] → [intermediate: int_*] → [marts: dim_*/fct_*]
Example dbt_project.yml plus model/macro snippets
# dbt_project.yml (excerpt)
name: my_project
version: 1.0.0
profile: my_profile
model-paths: ["models"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
test-paths: ["tests"]
models:
+materialized: view
my_project:
staging:
+schema: stg
+tags: ["staging"]
intermediate:
+schema: int
+tags: ["transform"]
marts:
+schema: mart
+materialized: table
sales:
+schema: mart_sales
seeds:
my_project:
+schema: seed
+quote_columns: false
mappings:
country_codes:
+column_types:
country_code: varchar(2)
country_name: varchar(64)
# macros/util/surrogate_key.sql
{% macro surrogate_key(cols) %}
md5({{ cols | map('string') | join(" || '-' || ") }})
{% endmacro %}
# models/marts/sales/dim_customer.sql
{{ config(materialized='table') }}
with src as (
select * from {{ ref('int__orders_enriched') }}
), dedup as (
select distinct customer_id, customer_name from src
)
select
{{ surrogate_key(["customer_id"]) }} as customer_sk,
customer_id,
customer_name
from dedup;Project configuration is fundamentally hierarchical. Set defaults at the top and override them in lower directories to stabilize schema, materialization, and tags per layer. A conservative default is materialized=view, with marts as the only layer set to table.
For seeds, explicitly set schema, quote_columns, and column_types. In particular, give primary keys and code columns explicit lengths so you do not depend on each target warehouse's default types.
Unify filenames with prefixes that indicate the layer. stg_ normalizes types and column names on raw data, int_ joins and aggregates business logic, and dim_ and fct_ are the final deliverables. Avoid cross-layer references; ref should ideally point only one layer downstream.
Splitting schemas per layer keeps table listings and privilege design clear. In DAG visualizations, ensure the edges between layers consistently point in one direction to prevent cycles.
Seeds are intended for small reference tables. Manage code systems, labels, thresholds, and similar data — things that change infrequently but are joined into models — as CSVs. dbt seed loads them as database tables, and you can apply tests just like with models.
The pitfalls are size, types, and change management. Large CSVs are a bad fit. Set column_types explicitly, review diffs in pull requests, and pay attention to privileges and locale (delimiters, line endings) in operations.
Macros generate SQL at compile time. Functionize at a granularity that does not hide business logic too aggressively or hurt SQL readability. Surrogate-key generation, type conversion, and conditional tests are good examples.
Adapter differences can be absorbed via dispatch, but from a stability standpoint it is easier to debug if you avoid over-generalizing and keep adapter-specific branches lightweight. Pin versions when introducing external macro packages.
The Analytics Engineer exam targets the distinction between models / seeds / macros, the hierarchical settings in dbt_project.yml, ref/source, and the applicability of tests. In particular, the purpose of seeds, model-layer separation, and macro responsibilities are must-know.
In real-world work, use CI selectors (tags/paths) for partial builds, isolate privileges and catalogs with schema separation, and check for layer violations and macro overuse during reviews.
Analytics Engineer
問題 1
Your organization wants to standardize country codes (ISO2 → display name). Changes occur a few times per quarter and need to be tracked in development reviews. Which is the most appropriate implementation in dbt?
正解: A
For small data, low update frequency, and review-tracked changes, seeds are the best fit. CSV management lets diffs be reviewed and dbt seed materializes the table. Hard-coding into a macro hurts observability and changeability; manual INSERTs are not reproducible; source is excessive for this case since it assumes external entities.
How large can seeds realistically be?
The hard limit depends on your warehouse and runtime, but from an operational standpoint it is safer to keep seeds to a few tens of thousands of rows and a few MB. For anything larger, externalize the data as a table and manage it through a source definition.
Is it OK to join stg_ directly to dim_ across layers?
Avoid it. Keeping a one-way flow of stg → int → dim/fct simplifies dependencies, separates testing and ownership, and helps optimize build times.
How much business logic should live inside macros?
It is safest to limit macros to general utilities such as key generation, type conversion, and conditional filters — as long as SQL readability is preserved. Express complex aggregation logic as models so it remains testable.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...