dbt

dbt Project Structure Guide: Practical Layout for models / seeds / macros

2026-04-19
NicheeLab Editorial Team

In dbt, folder layout and naming conventions translate directly into developer experience and operational cost. The layout of models, seeds, and macros in particular has a direct impact on DAG visibility, review-ability, and CI selectivity.

This article presents a layout you can run with minimal surprise, grounded in stable features and the official documentation. Exam-prep takeaways are woven in at the end.

Why Structure Matters

dbt follows a "file placement = resource definition" philosophy. Models turn SQL into database objects, seeds turn CSVs into tables, and macros let you reuse logic via Jinja. When placement is ambiguous, DAG boundaries blur and reviews and impact analysis become difficult.

Exams also frequently ask about the staging → intermediate → marts split, the proper use of seeds, and the role and scope of macros. Lock in roles and granularity first, then apply consistent schemas and tags through dbt_project.yml.

  • Splitting the DAG into layers clarifies tests and ownership
  • Operate directories together with a paired naming convention
  • Apply schema, materialization, and tags in bulk via dbt_project.yml hierarchy settings

Recommended Layout: Practical Structure for models / seeds / macros

Following the principle of least surprise, split models in two tiers (layer then domain), organize seeds by purpose, and separate macros into general utilities and adapter dispatch. Declare schema and materialization per layer in dbt_project.yml, and tie them to CI selectors and tags.

The tree below is a general structure that scales from small to mid-sized projects. It uses three layers (stg/int/marts), groups seeds by function (e.g. mappings), and separates macros into util and tests.

  • Organize models by layer (staging, intermediate, marts) → then by domain
  • Split seeds into subfolders by change frequency and reference target (e.g. mappings)
  • Break macros down by responsibility — util, tests, materializations
Aspectmodelsseedsmacros
Primary purposeCreate views/tables and other models from SQLMaterialize small reference data from CSV into tablesReuse and abstract Jinja logic
Definition locationmodels/seeds/macros/
DB artifactview/table/incremental/ephemeralUsually a tableNone (expanded at compile time)
Typical commanddbt run / builddbt seeddbt run-operation (optional)
Test applicabilityBoth generic and singular testsGeneric tests (mainly PK / referential integrity)Not applicable
PitfallsLayer mixing inflates the DAG and creates cyclesLarge CSVs, untyped columns, insufficient privilegesOverusing macros that introduce side effects

A standard dbt project structure and DAG layering

project/
  dbt_project.yml
  packages.yml
  models/
    staging/
      src__app/
        stg_app__users.sql
        stg_app__orders.sql
    intermediate/
      int__orders_enriched.sql
    marts/
      sales/
        dim_customer.sql
        fct_order.sql
  seeds/
    mappings/
      country_codes.csv
  macros/
    util/
      surrogate_key.sql
      cast_to_boolean.sql
    tests/
      relationships_if.sql
  snapshots/
    app/
      orders_snapshot.sql
  tests/
    generic/
      not_null_if.sql

DAG (conceptual):
[raw.sources] → [staging: stg_*] → [intermediate: int_*] → [marts: dim_*/fct_*]

Example dbt_project.yml plus model/macro snippets

# dbt_project.yml (excerpt)
name: my_project
version: 1.0.0
profile: my_profile
model-paths: ["models"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
test-paths: ["tests"]

models:
  +materialized: view
  my_project:
    staging:
      +schema: stg
      +tags: ["staging"]
    intermediate:
      +schema: int
      +tags: ["transform"]
    marts:
      +schema: mart
      +materialized: table
      sales:
        +schema: mart_sales

seeds:
  my_project:
    +schema: seed
    +quote_columns: false
    mappings:
      country_codes:
        +column_types:
          country_code: varchar(2)
          country_name: varchar(64)

# macros/util/surrogate_key.sql
{% macro surrogate_key(cols) %}
md5({{ cols | map('string') | join(" || '-' || ") }})
{% endmacro %}

# models/marts/sales/dim_customer.sql
{{ config(materialized='table') }}
with src as (
  select * from {{ ref('int__orders_enriched') }}
), dedup as (
  select distinct customer_id, customer_name from src
)
select
  {{ surrogate_key(["customer_id"]) }} as customer_sk,
  customer_id,
  customer_name
from dedup;

Key Points for dbt_project.yml

Project configuration is fundamentally hierarchical. Set defaults at the top and override them in lower directories to stabilize schema, materialization, and tags per layer. A conservative default is materialized=view, with marts as the only layer set to table.

For seeds, explicitly set schema, quote_columns, and column_types. In particular, give primary keys and code columns explicit lengths so you do not depend on each target warehouse's default types.

  • Use hierarchical application of +schema, +materialized, +tags
  • Align selectors.yml with tags so CI can run partial builds
  • Set column_types on seeds explicitly to avoid drift from type mismatches

Naming Conventions and DAG Splits: stg / int / marts

Unify filenames with prefixes that indicate the layer. stg_ normalizes types and column names on raw data, int_ joins and aggregates business logic, and dim_ and fct_ are the final deliverables. Avoid cross-layer references; ref should ideally point only one layer downstream.

Splitting schemas per layer keeps table listings and privilege design clear. In DAG visualizations, ensure the edges between layers consistently point in one direction to prevent cycles.

  • stg_: type unification, naming conventions, PK generation
  • int_: multi-source integration, aggregation, window calculations
  • dim_/fct_: analytics tables. Reverse references to upstream layers are forbidden

When to Use Seeds and What to Watch For

Seeds are intended for small reference tables. Manage code systems, labels, thresholds, and similar data — things that change infrequently but are joined into models — as CSVs. dbt seed loads them as database tables, and you can apply tests just like with models.

The pitfalls are size, types, and change management. Large CSVs are a bad fit. Set column_types explicitly, review diffs in pull requests, and pay attention to privileges and locale (delimiters, line endings) in operations.

  • Limit to small, stable reference data
  • Manage column_types and primary-key tests as a set
  • Minimize per-environment differences (encoding, delimiters)

Design Principles and Reuse for Macros

Macros generate SQL at compile time. Functionize at a granularity that does not hide business logic too aggressively or hurt SQL readability. Surrogate-key generation, type conversion, and conditional tests are good examples.

Adapter differences can be absorbed via dispatch, but from a stability standpoint it is easier to debug if you avoid over-generalizing and keep adapter-specific branches lightweight. Pin versions when introducing external macro packages.

  • Start at "functionize once a pattern repeats three or more times"
  • Separate concerns: test macros vs. materializations
  • Pin external package versions and verify changes carefully

Exam-Prep and Real-World Checklist

The Analytics Engineer exam targets the distinction between models / seeds / macros, the hierarchical settings in dbt_project.yml, ref/source, and the applicability of tests. In particular, the purpose of seeds, model-layer separation, and macro responsibilities are must-know.

In real-world work, use CI selectors (tags/paths) for partial builds, isolate privileges and catalogs with schema separation, and check for layer violations and macro overuse during reviews.

  • By default, ref points one layer downstream to prevent cycles
  • Manage seed CSVs as a set: types, primary keys, and tests
  • Keep macros side-effect free and observable

Check Your Understanding

Analytics Engineer

問題 1

Your organization wants to standardize country codes (ISO2 → display name). Changes occur a few times per quarter and need to be tracked in development reviews. Which is the most appropriate implementation in dbt?

  1. Place a CSV under seeds/mappings, declare column_types, load it with dbt seed, and JOIN it from staging
  2. Hard-code the country-code dictionary in a macro and call the macro from each model to expand it
  3. Define a country-code conversion table as SQL under models/staging and maintain it with manual INSERTs
  4. Define it as a source for external data and only monitor freshness

正解: A

For small data, low update frequency, and review-tracked changes, seeds are the best fit. CSV management lets diffs be reviewed and dbt seed materializes the table. Hard-coding into a macro hurts observability and changeability; manual INSERTs are not reproducible; source is excessive for this case since it assumes external entities.

FAQ

How large can seeds realistically be?

The hard limit depends on your warehouse and runtime, but from an operational standpoint it is safer to keep seeds to a few tens of thousands of rows and a few MB. For anything larger, externalize the data as a table and manage it through a source definition.

Is it OK to join stg_ directly to dim_ across layers?

Avoid it. Keeping a one-way flow of stg → int → dim/fct simplifies dependencies, separates testing and ownership, and helps optimize build times.

How much business logic should live inside macros?

It is safest to limit macros to general utilities such as key generation, type conversion, and conditional filters — as long as SQL readability is preserved. Express complex aggregation logic as models so it remains testable.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.