dbt

Speeding Up dbt CI with state and defer for Incremental Runs

2026-04-19
NicheeLab Editorial Team

Running a full build of every model on each CI run is slow and expensive. dbt's state and defer features look at the most recent run, execute only the diff, and safely resolve unselected dependencies to existing production tables — making CI both stable and fast.

This article is based on the official documentation and walks through selector behaviors that show up on the exam, plus common CI design pitfalls. We are careful to call out areas that depend on your environment or dbt version.

How state and defer Enable Incremental Runs

dbt's state looks at artifacts produced by a previous run (typically manifest.json and friends) and compares them against the current repo state to detect changes. defer takes any dependencies you did not include in the build target and resolves them to existing relations (usually in production).

Together, this lets CI run only the changed models and their downstream impact while safely delegating upstream references to production — dramatically cutting both wait time and resource consumption.

  • state: detect changes by pointing to the previous run's artifact path
  • defer: resolve unselected references to relations recorded in the production artifacts
  • Combination: using --state and --defer together is the standard CI pattern

state + defer flow in CI

Git RepoCI Jobpushdbt build--state=prod_artifacts/ --deferProd artifactstarget/ manifest.json, etc.resolve refs to prod relationsrun only changed nodesstate:modified+

Minimal command example

dbt build \
  --state ./artifacts/prod \
  --defer \
  --select state:modified+

state Basics and Choosing the Right Selector

state compares against the previous run's artifacts to determine which nodes are new or modified. Typical comparison targets include model SQL, config values that affect the model body, and test definitions. Macro changes can ripple widely, so be aware that related nodes may be marked as modified.

Selectors control which diff you run. In particular, a trailing plus includes children (downstream) and a leading plus includes parents (upstream) — intuitive once you know it, but a favorite exam question, so lock it in.

  • --state takes the path to the previous target/ directory (the one containing target/manifest.json)
  • state:modified selects nodes that have changed since the previous run
  • state:new selects only newly added nodes
  • A trailing + includes children, a leading + includes parents (e.g. state:modified+ means changed nodes plus their downstream)
SelectorScopePrimary useNotes
state:modifiedOnly the changed nodesRun just the diff, narrowly targetedTests are implicitly included via build
state:newOnly newly added nodesValidating freshly added modelsExcludes existing nodes
state:modified+Changed nodes plus their descendants (downstream)Validate downstream impact tooUseful for checking schema compatibility
+state:modifiedChanged nodes plus their ancestors (upstream)Rebuild dependencies from source down to the changeUse when upstream needs to be refreshed

List the detected changes only (do not execute)

dbt ls --state ./artifacts/prod --select state:modified

defer Basics and How Reference Resolution Works

defer resolves ref and source references for any unselected dependency nodes to the existing relations recorded in the artifacts you pass via --state. Typically you point this at the full artifact set from the last successful production run.

This lets you build just the diff in CI's temporary schema without rebuilding upstream. But if the deferred-to relation does not exist, you will get a reference error — so the assumption is that the target actually exists in production.

  • On Core, using --defer with --state is the default pattern
  • defer is purely a reference-resolution mechanism; what actually runs is controlled separately by the selector
  • Even when production and CI use different target schemas or databases, resolution uses the fully qualified names recorded in the artifacts

Build only the diff in the CI target, using defer

dbt build \
  --target ci \
  --state ./artifacts/prod \
  --defer \
  --select state:modified+

CI Pipeline Design Patterns

In practice, you keep the target/ directory from production (or at least manifest.json) somewhere CI can always reach it. The workflow: on successful production jobs, push the artifacts to storage; in CI, pull them and hand the path to --state.

On dbt Cloud, you can defer to Production via UI settings and Cloud handles artifact resolution internally. On dbt Core you provide the artifacts yourself.

  • Archive target/ on successful production jobs to persist it
  • At the start of CI, fetch and extract the archive, then pass it to --state
  • Default CI to state:modified+ and widen the scope only for breaking changes
  • Enable partial parsing to cut job time (on a compatible dbt version)

Conceptual GitHub Actions example

jobs:
  ci:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Restore prod artifacts
        run: |
          mkdir -p artifacts/prod
          aws s3 cp s3://my-bucket/dbt/prod/last_success/ ./artifacts/prod --recursive
      - name: Install dbt
        run: pip install dbt-core dbt-bigquery  # example: match your target
      - name: Build changed nodes with deferral
        run: |
          dbt build \
            --profiles-dir . \
            --target ci \
            --state ./artifacts/prod \
            --defer \
            --select state:modified+

Pitfalls and Validation Strategy

If you let defer hide schema-breaking changes (dropped columns, type changes), child models will fail at runtime. Default to state:modified+ so descendants of changed nodes run, and widen to +state:modified+ when a breaking change is suspected.

For incremental models, deferring unselected upstream to production does not guarantee the new-vs-old data diff is what you intended. For critical paths, consider a full refresh or a scoped recomputation.

  • Do not mix up trailing vs leading + (trailing = downstream)
  • You cannot reference a relation that does not exist at the defer target (ensure it exists in production first)
  • Macro changes can propagate widely, so eyeball the detected changes with dbt ls
  • Using build also runs tests tied to the selected resources

Validate changed models and their tests together

dbt build \
  --state ./artifacts/prod \
  --defer \
  --select state:modified+

Exam Prep Highlights (Analytics Engineer)

The exam often tests the meaning of state selectors, the direction of +, and the purpose and preconditions of defer (referencing an existing relation). Be ready for CI scenario questions, too — you should be able to write the correct command from memory.

  • state:modified+ = changed nodes plus downstream. +state:modified = changed nodes plus upstream.
  • --defer toggles reference resolution; what runs is selected via --select.
  • --state takes the path to the previous run's artifact directory.
  • dbt build handles models, seeds, snapshots, and tests in one pass.

Commands worth memorizing

dbt ls --state ./artifacts/prod --select state:modified

dbt build --state ./artifacts/prod --defer --select state:modified+

dbt build --state ./artifacts/prod --defer --select +state:modified+

Check Your Understanding

Analytics Engineer

問題 1

Upstream tables exist in production. In CI you want to run only the changed models and their downstream, while resolving unselected references to production. Which command is most appropriate?

  1. dbt build --state ./artifacts/prod --defer --select state:modified+
  2. dbt build --select state:modified+
  3. dbt build --state ./artifacts/prod --select +state:modified
  4. dbt build --defer --select state:new

正解: A

The requirements are diff detection plus reference resolution to production. A is correct: it combines --state and --defer with state:modified+ (changed nodes plus downstream). B has no defer. C includes upstream, which is the opposite of what we want. D only matches new nodes, which does not meet the requirement.

Frequently Asked Questions

What artifacts do I need to provide for state?

At minimum, you need the previous run's target/ directory containing manifest.json. In practice, archive target/ on successful production runs and point CI's --state at that directory.

Is defer different between dbt Cloud and dbt Core?

The mechanism is the same, but Cloud lets you defer to Production via environment settings and handles artifact retrieval internally. With Core, you specify --defer and --state yourself and implement artifact storage and distribution in your pipeline.

When I change a macro, how far does state:modified propagate?

Macro changes may be flagged on any node whose compilation depends on the macro. The impact can be broad, so right after a change, verify the selection with dbt ls --state ... and widen the selection as needed.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.