dbt

How to Use dbt Seeds: Building Static Tables from CSV for Production and Exams

2026-04-19
NicheeLab Editorial Team

Seeds is the dbt feature that loads CSVs bundled with your project into the target data warehouse as tables. It lets you ship small to medium reference data under version control, insulated from changes in external systems.

This article walks through where to place files, how to configure them, type definitions, operational best practices, and when to choose Seeds versus other dbt features — calling out the points most often tested on the exam.

Seeds Basics: Building Stable Reference Tables from CSV

Seeds takes CSVs placed in your project's seeds directory and creates or updates them as physical tables in the warehouse. Unlike models, you don't write SQL — the CSV contents become the row data directly. The table name is normally derived from the filename (without the extension), and other models reference it via ref('...').

Typical use cases are small, static, shared reference data: country code dictionaries, category normalization tables, threshold and grouping definitions. The dbt seed command deploys them reproducibly, so they behave reliably in CI as well.

  • Location: under your project's seeds/ directory (older projects may still use data/)
  • Output: a physical table (not a view)
  • Reference: use ref('seed_name') from other models for JOINs and filters
  • CSVs require a header row — column names become the table's column names
  • Unless you specify types, columns are usually created as text. In practice, declaring types explicitly via column_types is strongly recommended

Where Seeds fits in the pipeline

dbt seedref('countries')ref('countries')CSV under seeds/(e.g., countries.csv)Physical table in DWH(e.g., reference.countries)Intermediate / aggregate modelsAnalytics / marts

Quick commands

dbt seed
# 特定のファイルだけ
# dbt seed --select path:seeds/countries.csv
# 他モデルから参照(SQL)
# select * from {{ ref('countries') }};

Folder Structure and Basic dbt_project.yml Settings

Place CSVs under your project's seeds/ directory. Subdirectories double as namespace organization, and the table name comes from the filename (without the extension). Schema, types, and identifier quoting are configured in the seeds section of dbt_project.yml.

Configuration precedence follows the standard resource rules — deeper scopes (project → package → directory → individual file) override shallower ones. In practice, decide three things up front for stability: carve out a dedicated schema for seeds, lock identifier quoting to match your environment, and always declare types explicitly.

  • Schema is set with +schema (e.g., reference). Switch by environment by driving schema rules off target.name
  • Column-name quoting (+quote_columns) changes how case and reserved words are handled per warehouse — pin it to your organization's standard
  • Declare column types explicitly with +column_types (effectively required when any non-text columns are involved)
  • CSVs require a header row. The default delimiter is a comma — adjust via delimiter if needed

Example dbt_project.yml (Seeds settings)

name: my_project
profile: my_profile

seeds:
  +schema: reference
  +quote_columns: false
  my_project:
    geo:  # seeds/geo/ 配下
      +column_types:
        country_code: varchar(2)
        region: varchar(32)
        is_eu: boolean
    lookup:
      +schema: reference
      # 必要に応じて区切り文字を変更
      # +delimiter: ";"

Type Definitions and Load Behavior: column_types and Identifier Handling

On most adapters, columns without a declared type land as text (STRING/VARCHAR). To make calculations and joins reliable, declare numeric, date, and boolean columns explicitly via column_types. Type names follow the adapter (Snowflake, BigQuery, Databricks, etc.).

Column-name quoting (quote_columns) drives how case and reserved words are handled. On Snowflake, enabling quoting preserves the case of column names, while disabling it upper-cases them. Align with your organization's naming convention (all lowercase, underscores, etc.) and stay consistent between Seeds and models.

Re-running dbt seed recreates or replaces the target table (per the adapter's implementation) and overwrites it with the CSV contents. There is no incremental application.

  • Recommended: declare every non-text column explicitly via column_types
  • For datetimes and time zones, verify your warehouse's types and comparison semantics in advance
  • Strip whitespace, NULLs, and duplicates from join keys on the Seeds side (a quality-assurance point often tested on exams)

Minimal column_types example

seeds:
  my_project:
    business:
      +column_types:
        id: bigint
        threshold: numeric(10,2)
        valid_from: date
        active: boolean

Operations: Update Flow, Environment Separation, and CI in Practice

With Seeds, CSV diffs are the data diffs. Keep them small for easy review and avoid stray columns or blank rows. Oversized Seeds inflate load time and warehouse maintenance overhead — as a rule of thumb, once you exceed tens of thousands to a hundred-plus thousand rows, consider external loading or staging through a regular model.

In CI/CD, run dbt seed early and follow with build for the dependent models. Narrow the scope with selectors to speed things up. If grants are required, apply rules to seeds as well.

  • Limit Seeds to small, stable reference data
  • Strictly separate schemas by environment (only reviewed CSVs reach prod)
  • Selective execution: seed only what's changed
  • Rebuild dependents with dbt build --select +ref('seed_name')

Commands you'll use often

# すべての Seeds
 dbt seed
# 変更のあった Seeds だけ(例: ディレクトリ指定)
 dbt seed --select path:seeds/geo/**
# Seeds 実行後に依存モデルをビルド
 dbt build --select +ref:countries
# タグで束ねて実行
 dbt seed --select config:tags:static

Testing and Model References: ref and Data Quality

Models reference Seeds via ref('seed_name'). Common patterns are JOINing them as dictionary tables or using them as accepted-value lists with IN. You can define data tests — not_null, unique, accepted_values, relationships — against Seeds just like any other model.

For exam prep, questions that ask you to distinguish Seeds (static reference data), Sources (variable source data), and Snapshots (history retention) come up often.

  • Lock down JOIN keys with both unique and not_null tests
  • Manage accepted-value lists via accepted_values (with Seeds as the single source of truth)
  • Always use ref inside models — never hard-code fully qualified names

Reference and test example

-- models/fct_orders.sql
select
  o.order_id,
  o.country_code,
  c.region
from {{ ref('stg_orders') }} as o
left join {{ ref('countries') }} as c
  on o.country_code = c.country_code;

# models/schema.yml(抜粋)
version: 2
seeds:
  - name: countries
    columns:
      - name: country_code
        tests: [not_null, unique]
      - name: region
        tests:
          - accepted_values:
              values: ["EMEA", "AMER", "APAC"]

Choosing Between Seeds, Sources, Snapshots, and External Loading

Seeds is the right choice when you need to deliver small, static, project-bundled data quickly and reliably. For variable data from external systems, cases requiring history retention, or large data volumes, other features are the standard answer.

  • Static, small, reference-use: Seeds
  • Continuously ingested from external DBs or object storage: Sources + models
  • Row-level versioning or Slowly Changing Dimensions: Snapshots
  • Hundreds of thousands of rows or more: ingest via external loading or ELT pipelines, then model
Use case / dimensionSeedsSourcesSnapshots
Primary purposeCreate static reference tables from CSVIngest and reference raw data from external systemsRetain row-level history of a model (change tracking)
Data volumeSmall to medium (a few to tens of thousands of rows)Medium to large (assumes continuous ingestion)Depends on the model (grows with history)
Update methodReplace/recreate via dbt seedLoad externally, then reference via source definitionsApply diffs by running dbt snapshot
Type managementDeclare explicitly via column_types (recommended)Depends on the upstream schemaDepends on the snapshot strategy's key/column definitions
Typical exam focusChoose Seeds for static dictionaries/accepted valuesExternal table connections and source freshnessWhen to apply history retention and how to design keys

Check Your Understanding

Analytics Engineer

問題 1

You want to manage a country-code-to-region mapping in CSV and reference it identically on every deploy. Which dbt feature fits best?

  1. A. Seeds
  2. B. Sources
  3. C. Snapshots
  4. D. Macros

正解: A

Seeds creates DWH tables from CSVs bundled with the project — the perfect fit for static dictionary or reference data. Sources reference data from external systems, Snapshots retain row history, and Macros are for reusing logic. None of those match this use case.

Frequently Asked Questions

Can I use formats other than CSV (JSON, Parquet, etc.) for Seeds?

No. Seeds are CSV-based (you can adjust the delimiter). For other formats, the standard approach is to ingest them through external loads or staging, then treat them as regular models.

Are large Seeds (hundreds of thousands of rows or more) a problem?

Yes. Load time, transactions, and compile time all grow, dragging down your CI/CD. The exact threshold depends on your environment, but once you exceed tens of thousands to a hundred-plus thousand rows, consider ELT ingestion or external tables plus modeling instead.

What happens to dbt seed if a table with the same name already exists?

Depending on the adapter implementation, dbt replaces or recreates the table and overwrites it with the CSV contents. To avoid collisions, set up a dedicated schema and, if needed, separate namespaces via alias or schema configuration.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.