Seeds is the dbt feature that loads CSVs bundled with your project into the target data warehouse as tables. It lets you ship small to medium reference data under version control, insulated from changes in external systems.
This article walks through where to place files, how to configure them, type definitions, operational best practices, and when to choose Seeds versus other dbt features — calling out the points most often tested on the exam.
Seeds takes CSVs placed in your project's seeds directory and creates or updates them as physical tables in the warehouse. Unlike models, you don't write SQL — the CSV contents become the row data directly. The table name is normally derived from the filename (without the extension), and other models reference it via ref('...').
Typical use cases are small, static, shared reference data: country code dictionaries, category normalization tables, threshold and grouping definitions. The dbt seed command deploys them reproducibly, so they behave reliably in CI as well.
Where Seeds fits in the pipeline
Quick commands
dbt seed
# 特定のファイルだけ
# dbt seed --select path:seeds/countries.csv
# 他モデルから参照(SQL)
# select * from {{ ref('countries') }};Place CSVs under your project's seeds/ directory. Subdirectories double as namespace organization, and the table name comes from the filename (without the extension). Schema, types, and identifier quoting are configured in the seeds section of dbt_project.yml.
Configuration precedence follows the standard resource rules — deeper scopes (project → package → directory → individual file) override shallower ones. In practice, decide three things up front for stability: carve out a dedicated schema for seeds, lock identifier quoting to match your environment, and always declare types explicitly.
Example dbt_project.yml (Seeds settings)
name: my_project
profile: my_profile
seeds:
+schema: reference
+quote_columns: false
my_project:
geo: # seeds/geo/ 配下
+column_types:
country_code: varchar(2)
region: varchar(32)
is_eu: boolean
lookup:
+schema: reference
# 必要に応じて区切り文字を変更
# +delimiter: ";"On most adapters, columns without a declared type land as text (STRING/VARCHAR). To make calculations and joins reliable, declare numeric, date, and boolean columns explicitly via column_types. Type names follow the adapter (Snowflake, BigQuery, Databricks, etc.).
Column-name quoting (quote_columns) drives how case and reserved words are handled. On Snowflake, enabling quoting preserves the case of column names, while disabling it upper-cases them. Align with your organization's naming convention (all lowercase, underscores, etc.) and stay consistent between Seeds and models.
Re-running dbt seed recreates or replaces the target table (per the adapter's implementation) and overwrites it with the CSV contents. There is no incremental application.
Minimal column_types example
seeds:
my_project:
business:
+column_types:
id: bigint
threshold: numeric(10,2)
valid_from: date
active: booleanWith Seeds, CSV diffs are the data diffs. Keep them small for easy review and avoid stray columns or blank rows. Oversized Seeds inflate load time and warehouse maintenance overhead — as a rule of thumb, once you exceed tens of thousands to a hundred-plus thousand rows, consider external loading or staging through a regular model.
In CI/CD, run dbt seed early and follow with build for the dependent models. Narrow the scope with selectors to speed things up. If grants are required, apply rules to seeds as well.
Commands you'll use often
# すべての Seeds
dbt seed
# 変更のあった Seeds だけ(例: ディレクトリ指定)
dbt seed --select path:seeds/geo/**
# Seeds 実行後に依存モデルをビルド
dbt build --select +ref:countries
# タグで束ねて実行
dbt seed --select config:tags:staticModels reference Seeds via ref('seed_name'). Common patterns are JOINing them as dictionary tables or using them as accepted-value lists with IN. You can define data tests — not_null, unique, accepted_values, relationships — against Seeds just like any other model.
For exam prep, questions that ask you to distinguish Seeds (static reference data), Sources (variable source data), and Snapshots (history retention) come up often.
Reference and test example
-- models/fct_orders.sql
select
o.order_id,
o.country_code,
c.region
from {{ ref('stg_orders') }} as o
left join {{ ref('countries') }} as c
on o.country_code = c.country_code;
# models/schema.yml(抜粋)
version: 2
seeds:
- name: countries
columns:
- name: country_code
tests: [not_null, unique]
- name: region
tests:
- accepted_values:
values: ["EMEA", "AMER", "APAC"]Seeds is the right choice when you need to deliver small, static, project-bundled data quickly and reliably. For variable data from external systems, cases requiring history retention, or large data volumes, other features are the standard answer.
| Use case / dimension | Seeds | Sources | Snapshots |
|---|---|---|---|
| Primary purpose | Create static reference tables from CSV | Ingest and reference raw data from external systems | Retain row-level history of a model (change tracking) |
| Data volume | Small to medium (a few to tens of thousands of rows) | Medium to large (assumes continuous ingestion) | Depends on the model (grows with history) |
| Update method | Replace/recreate via dbt seed | Load externally, then reference via source definitions | Apply diffs by running dbt snapshot |
| Type management | Declare explicitly via column_types (recommended) | Depends on the upstream schema | Depends on the snapshot strategy's key/column definitions |
| Typical exam focus | Choose Seeds for static dictionaries/accepted values | External table connections and source freshness | When to apply history retention and how to design keys |
Analytics Engineer
問題 1
You want to manage a country-code-to-region mapping in CSV and reference it identically on every deploy. Which dbt feature fits best?
正解: A
Seeds creates DWH tables from CSVs bundled with the project — the perfect fit for static dictionary or reference data. Sources reference data from external systems, Snapshots retain row history, and Macros are for reusing logic. None of those match this use case.
Can I use formats other than CSV (JSON, Parquet, etc.) for Seeds?
No. Seeds are CSV-based (you can adjust the delimiter). For other formats, the standard approach is to ingest them through external loads or staging, then treat them as regular models.
Are large Seeds (hundreds of thousands of rows or more) a problem?
Yes. Load time, transactions, and compile time all grow, dragging down your CI/CD. The exact threshold depends on your environment, but once you exceed tens of thousands to a hundred-plus thousand rows, consider ELT ingestion or external tables plus modeling instead.
What happens to dbt seed if a table with the same name already exists?
Depending on the adapter implementation, dbt replaces or recreates the table and overwrites it with the CSV contents. To avoid collisions, set up a dedicated schema and, if needed, separate namespaces via alias or schema configuration.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...