DuckDB is an embedded database that runs as a single file or fully in memory. The dbt-duckdb adapter lets you validate dbt models locally — no cloud DWH required — and iterate extremely quickly.
This guide is written for both daily practice and certification prep. It sticks to the stable, documentation-aligned usage patterns and proactively flags the pitfalls that tend to cause rework.
dbt-duckdb is the adapter for running dbt models, tests, and snapshots on DuckDB. DuckDB needs no server and stores data either in a local file (.duckdb) or in memory. That means engineers can iterate on modeling locally without depending on network connectivity or cloud billing.
From an exam perspective, universal dbt concepts — materializations (view / table / incremental / ephemeral), inter-model dependencies, source definitions, tests, and snapshots — apply identically on DuckDB. Knowing the DuckDB-specific caveats (no permission system, avoid concurrent writes to the same file) saves you from confusion in both work and exams.
To use DuckDB with dbt you write adapter-specific settings in profiles.yml. The minimum trio is type, path, and schema. Adding extensions and settings lets you control HTTP/Parquet reads, memory limits, and so on.
The basic local development flow is: source definitions → models (staging/marts) → tests → snapshots (when needed). Wire things up loosely with views first to shape the schema, then promote finalized output tables to table materializations — this rhythm suits iteration.
A typical local development flow (dbt + DuckDB)
Minimal profiles.yml and dbt_project.yml
# ~/.dbt/profiles.yml
duckdb:
outputs:
dev:
type: duckdb
path: ./local/dev.duckdb
schema: analytics
threads: 4
extensions:
- httpfs
- parquet
settings:
memory_limit: 2GB
target: dev
# dbt_project.yml
name: duckdb_local_lab
version: 1.0
profile: duckdb
models:
duckdb_local_lab:
+materialized: view
staging:
+schema: staging
marts:
+materialized: tableAll of dbt's standard materializations (view / table / incremental / ephemeral) work on DuckDB. If iteration speed matters, keep staging as view and finalize facts and dimensions as table. ephemeral is useful when you want to inline a small preprocessing step into the caller.
For incremental models, append or delete+insert are the stable strategies. Set unique_key and define the load condition inside an is_incremental() block. When combining DuckDB with file sources, base your delta load on arrival timestamps or a file list.
Incremental model example for DuckDB (delete+insert)
-- models/marts/fct_orders.sql
{{ config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='delete+insert'
) }}
with src as (
select * from {{ ref('stg_orders') }}
{% if is_incremental() %}
where updated_at > (select coalesce(max(updated_at), '1970-01-01') from {{ this }})
{% endif %}
)
select
order_id,
customer_id,
order_date,
updated_at,
total_amount
from srcDuckDB can read files directly via SQL functions. For local files use read_csv_auto() or read_parquet(); for remote files enable the httpfs extension to reference objects over HTTP/HTTPS. List the extensions you need under extensions in profiles.yml.
In practice, rather than ingesting external files immediately, you can reference them directly through a view and profile the data first. Once you understand the types and missingness patterns, add a model that CTASes (create table as select) into a finalized table. This minimizes wasted I/O early in iteration.
Example model referencing CSV/Parquet as a view
-- models/staging/stg_sales_files.sql
{{ config(materialized='view') }}
with p as (
-- Read all Parquet files under the directory in one go
select * from read_parquet('data/sales/*.parquet')
),
cs as (
-- Remote CSV (requires the httpfs extension to be enabled)
select * from read_csv_auto('https://example.com/sales/latest.csv')
)
select * from p
union all
select * from csdbt's generic tests and custom tests run unchanged on DuckDB. Snapshots are a SQL-level feature, so they work too. While developing, run dbt test frequently to catch missing immutable keys or uniqueness constraints early.
In CI, use a repo-committed profiles.yml and write the .duckdb file to a temp directory. Rebuilding cleanly each run causes fewer problems than long jobs that rely on caching. To prevent parallel jobs from writing to the same file, isolate paths per job.
Lightweight CI example with GitHub Actions (DuckDB)
name: dbt-duckdb-ci
on:
pull_request:
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: '3.11'
- run: pip install --upgrade pip && pip install dbt-duckdb
- name: dbt deps / seed / run / test
env:
DBT_PROFILES_DIR: ./.dbt
run: |
dbt deps
dbt seed --threads 4
dbt run --threads 4
dbt test --threads 4Once you've nailed the shape in DuckDB, the common next step is moving to a DWH/DB like Snowflake, Databricks SQL, or Postgres. Watch for differences in connection style, permissions, concurrency, incremental strategy, and types. Contain engine-specific functions in the model layer and abstract shared logic via macros and adapter.dispatch — migration becomes much easier.
Object naming and quoting, presence of schemas, and time-type handling are perennial migration pain points. Add a single CI run using the production adapter for mock execution to surface these differences early.
| Aspect | DuckDB (local/embedded) | Postgres (server-based) | Snowflake (cloud DWH) |
|---|---|---|---|
| Connection model | Single process / file or in-memory | TCP connection to a long-running server | Managed warehouse |
| Permissions | Limited (grants effectively no-op) | Roles and privileges | Roles, privileges, full RBAC |
| Startup / readiness | Instant (just open the file) | Service must be running | Warehouse start-up and allocation required |
| Concurrent writes | Single-process recommended | Parallel across multiple clients | Parallel across multiple sessions |
| Cost | No incremental cost on local runs | Infrastructure operating cost | Usage-based (compute and storage) |
| Common incremental strategies | append / delete+insert are stable | Widely supported including MERGE | MERGE strategy is common |
Analytics Engineer
問題 1
You want to reference an HTTP-hosted CSV from a view model using dbt-duckdb locally. Which configuration combination is correct?
正解: A
Referencing files over HTTP/HTTPS requires DuckDB's httpfs extension. Enable it via extensions in profiles.yml (not dbt project config) and reference the file with functions like read_csv_auto() inside the model. :memory: means an in-memory DB, which is orthogonal to whether external files can be read. seed is for loading local static files, not for ingesting directly over HTTP.
Are dbt grants effective with dbt-duckdb?
DuckDB does not have a server-style permission model, so dbt's grants configuration has effectively no impact. If you need access control in production, switch to an adapter like Snowflake or Postgres.
Can multiple dbt runs update the same .duckdb file at the same time?
Not recommended. Concurrent writes to a single file cause contention. Either serialize the jobs or use a separate file per run.
Can I cap DuckDB's memory usage?
Yes — set DuckDB connection settings such as memory_limit under settings in profiles.yml. The cap is enforced by DuckDB itself, so pick a value appropriate to the host environment.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...