dbt

dbt DuckDB Adapter Hands-On Guide: Local/Embedded Development and Exam Prep

2026-04-19
NicheeLab Editorial Team

DuckDB is an embedded database that runs as a single file or fully in memory. The dbt-duckdb adapter lets you validate dbt models locally — no cloud DWH required — and iterate extremely quickly.

This guide is written for both daily practice and certification prep. It sticks to the stable, documentation-aligned usage patterns and proactively flags the pitfalls that tend to cause rework.

Where the DuckDB Adapter Fits and Its Sweet-Spot Use Cases

dbt-duckdb is the adapter for running dbt models, tests, and snapshots on DuckDB. DuckDB needs no server and stores data either in a local file (.duckdb) or in memory. That means engineers can iterate on modeling locally without depending on network connectivity or cloud billing.

From an exam perspective, universal dbt concepts — materializations (view / table / incremental / ephemeral), inter-model dependencies, source definitions, tests, and snapshots — apply identically on DuckDB. Knowing the DuckDB-specific caveats (no permission system, avoid concurrent writes to the same file) saves you from confusion in both work and exams.

  • Best fit: local prototyping, learning and workshops, lightweight CI, and upfront data profiling
  • Default schema: main; switch it freely via the schema field in profiles.yml
  • Permissions: DuckDB has no server-style access control, so dbt's grants config has limited effect
  • Concurrency: avoid simultaneous writes to the same .duckdb file — serialize jobs to stay safe

Profile Configuration and the Local Development Flow

To use DuckDB with dbt you write adapter-specific settings in profiles.yml. The minimum trio is type, path, and schema. Adding extensions and settings lets you control HTTP/Parquet reads, memory limits, and so on.

The basic local development flow is: source definitions → models (staging/marts) → tests → snapshots (when needed). Wire things up loosely with views first to shape the schema, then promote finalized output tables to table materializations — this rhythm suits iteration.

  • Relative paths for path are convenient. In CI, place the file under a temp directory for a clean run
  • To read files over HTTP or from S3, add httpfs to extensions
  • Pick a sensible threads value for your local CPU. DuckDB also parallelizes internally, so an excessive thread count can be counterproductive

A typical local development flow (dbt + DuckDB)

CSV / ParquetHTTP / S3 / LocalDuckDB (file)dev.duckdbdbt run / testdbt modelsstaging / martsviews / tables(main schema)read_csv_auto() /read_parquet()compile + executeartifacts(target dir)dbt test / snapshot

Minimal profiles.yml and dbt_project.yml

# ~/.dbt/profiles.yml
duckdb:
  outputs:
    dev:
      type: duckdb
      path: ./local/dev.duckdb
      schema: analytics
      threads: 4
      extensions:
        - httpfs
        - parquet
      settings:
        memory_limit: 2GB
  target: dev

# dbt_project.yml
name: duckdb_local_lab
version: 1.0
profile: duckdb
models:
  duckdb_local_lab:
    +materialized: view
    staging:
      +schema: staging
    marts:
      +materialized: table

Modeling and Materialization Strategy (DuckDB Caveats)

All of dbt's standard materializations (view / table / incremental / ephemeral) work on DuckDB. If iteration speed matters, keep staging as view and finalize facts and dimensions as table. ephemeral is useful when you want to inline a small preprocessing step into the caller.

For incremental models, append or delete+insert are the stable strategies. Set unique_key and define the load condition inside an is_incremental() block. When combining DuckDB with file sources, base your delta load on arrival timestamps or a file list.

  • Exam angle: the default materialization comes from project config — not from the adapter — and can be overridden per model via config
  • Grants are effectively a no-op on DuckDB. Setting them has no impact
  • Avoid parallel writes to the same file. If you need multiple targets, use separate files

Incremental model example for DuckDB (delete+insert)

-- models/marts/fct_orders.sql
{{ config(
    materialized='incremental',
    unique_key='order_id',
    incremental_strategy='delete+insert'
) }}

with src as (
  select * from {{ ref('stg_orders') }}
  {% if is_incremental() %}
    where updated_at > (select coalesce(max(updated_at), '1970-01-01') from {{ this }})
  {% endif %}
)

select
  order_id,
  customer_id,
  order_date,
  updated_at,
  total_amount
from src

File Ingestion and Using Extensions (httpfs / parquet)

DuckDB can read files directly via SQL functions. For local files use read_csv_auto() or read_parquet(); for remote files enable the httpfs extension to reference objects over HTTP/HTTPS. List the extensions you need under extensions in profiles.yml.

In practice, rather than ingesting external files immediately, you can reference them directly through a view and profile the data first. Once you understand the types and missingness patterns, add a model that CTASes (create table as select) into a finalized table. This minimizes wasted I/O early in iteration.

  • When using httpfs, verify the URL and authentication scheme match your environment beforehand
  • Pass a wildcard or directory to read_parquet() to read multiple files in one shot
  • Exam angle: dbt sources help manage metadata for external systems and improve observability over inlining read_parquet() directly

Example model referencing CSV/Parquet as a view

-- models/staging/stg_sales_files.sql
{{ config(materialized='view') }}

with p as (
  -- Read all Parquet files under the directory in one go
  select * from read_parquet('data/sales/*.parquet')
),
cs as (
  -- Remote CSV (requires the httpfs extension to be enabled)
  select * from read_csv_auto('https://example.com/sales/latest.csv')
)

select * from p
union all
select * from cs

Tests, Snapshots, and Integrating with CI

dbt's generic tests and custom tests run unchanged on DuckDB. Snapshots are a SQL-level feature, so they work too. While developing, run dbt test frequently to catch missing immutable keys or uniqueness constraints early.

In CI, use a repo-committed profiles.yml and write the .duckdb file to a temp directory. Rebuilding cleanly each run causes fewer problems than long jobs that rely on caching. To prevent parallel jobs from writing to the same file, isolate paths per job.

  • Minimum command sequence: dbt deps → dbt seed → dbt run → dbt test
  • When using state comparison, pass the previous artifacts via --state and rerun only the affected scope
  • Always declare the snapshot's unique key and update-detection column explicitly

Lightweight CI example with GitHub Actions (DuckDB)

name: dbt-duckdb-ci
on:
  pull_request:
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - run: pip install --upgrade pip && pip install dbt-duckdb
      - name: dbt deps / seed / run / test
        env:
          DBT_PROFILES_DIR: ./.dbt
        run: |
          dbt deps
          dbt seed --threads 4
          dbt run  --threads 4
          dbt test --threads 4

From Prototype to Production DWH (Differences and Pitfalls)

Once you've nailed the shape in DuckDB, the common next step is moving to a DWH/DB like Snowflake, Databricks SQL, or Postgres. Watch for differences in connection style, permissions, concurrency, incremental strategy, and types. Contain engine-specific functions in the model layer and abstract shared logic via macros and adapter.dispatch — migration becomes much easier.

Object naming and quoting, presence of schemas, and time-type handling are perennial migration pain points. Add a single CI run using the production adapter for mock execution to surface these differences early.

  • Checklist: naming conventions, schema strategy, type mapping, incremental strategy, permissions and ownership
  • Macro abstraction: call DuckDB-specific functions like read_parquet through a dedicated macro to make them easy to swap out
  • Exam angle: model readability and reusability (packaging, macros) are emphasized
AspectDuckDB (local/embedded)Postgres (server-based)Snowflake (cloud DWH)
Connection modelSingle process / file or in-memoryTCP connection to a long-running serverManaged warehouse
PermissionsLimited (grants effectively no-op)Roles and privilegesRoles, privileges, full RBAC
Startup / readinessInstant (just open the file)Service must be runningWarehouse start-up and allocation required
Concurrent writesSingle-process recommendedParallel across multiple clientsParallel across multiple sessions
CostNo incremental cost on local runsInfrastructure operating costUsage-based (compute and storage)
Common incremental strategiesappend / delete+insert are stableWidely supported including MERGEMERGE strategy is common

Check Your Understanding

Analytics Engineer

問題 1

You want to reference an HTTP-hosted CSV from a view model using dbt-duckdb locally. Which configuration combination is correct?

  1. Add httpfs to extensions in profiles.yml and use read_csv_auto('https://...') in the model
  2. Set httpfs: true under models in dbt_project.yml and just write a normal select in the model
  3. Set path to :memory: in profiles.yml and create an external table in the model
  4. Set threads to 1 in profiles.yml and ingest via seed

正解: A

Referencing files over HTTP/HTTPS requires DuckDB's httpfs extension. Enable it via extensions in profiles.yml (not dbt project config) and reference the file with functions like read_csv_auto() inside the model. :memory: means an in-memory DB, which is orthogonal to whether external files can be read. seed is for loading local static files, not for ingesting directly over HTTP.

Frequently Asked Questions

Are dbt grants effective with dbt-duckdb?

DuckDB does not have a server-style permission model, so dbt's grants configuration has effectively no impact. If you need access control in production, switch to an adapter like Snowflake or Postgres.

Can multiple dbt runs update the same .duckdb file at the same time?

Not recommended. Concurrent writes to a single file cause contention. Either serialize the jobs or use a separate file per run.

Can I cap DuckDB's memory usage?

Yes — set DuckDB connection settings such as memory_limit under settings in profiles.yml. The cap is enforced by DuckDB itself, so pick a value appropriate to the host environment.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.