dbt

dbt CI/CD End-to-End: From Pull Request to Production

2026-04-19
NicheeLab Editorial Team

This article is grounded in a 'stable' operational design that works equally well for dbt Cloud and dbt Core + GitHub Actions, and turns the decision points from PR to merge to production into concrete guidance.

It covers the keywords most commonly asked on the exam — Slim CI, state artifacts, defer, run/test/build, environment isolation — alongside practical configuration examples so you can lock them in quickly.

CI/CD Foundations and the Big Picture

Design dbt CI/CD as two distinct concerns: a CI that quickly validates only the scope impacted by a change, and a CD that safely promotes changes to production. The basic pattern is to auto-trigger CI on PR creation and validate only the diff with state:modified and defer. After merge, the production job runs on a schedule or event and executes run/test against the stable resources.

On the database/lakehouse side, enforce isolation by environment and schema. Write CI outputs to a per-developer or per-branch temporary schema, and grant write permission on the production schema (or catalog) to the production environment only.

  • Three-tier environments: development (dev) / staging (stg) / production (prod)
  • Slim CI: use state:modified+ and defer to run only the diff and its dependencies
  • Separate jobs: PR CI job, staging integration job, production job
  • Artifact management: persist manifest.json and run_results.json and reuse them as the CI state
  • Permissions and isolation: separate roles/schemas/catalogs per environment (works for both Snowflake and Databricks)
EnvironmentTypical responsibilitiesKey dbt commands
dev (per-person / per-PR)Diff validation and experimentationdbt build --select state:modified+ --defer --state path/to/prod_artifacts
stg (integration)Integration checks across multiple PRs and load testingdbt build --selector stg_selector
prod (production)Producing and publishing finalized datadbt run && dbt test (or simply dbt build)

High-level flow from PR to production

Dev (branch)
   |  open PR
   v
[CI Job] --- build(test) on temp schema
   |  status check green?
   v
merge to main
   |
   v
[CD Job (prod)] -- run/test on prod schema
   |
   v
Publish docs / exposures
   |
   v
Consumers (BI/ML)

selectors.yml example (prioritize the diff)

selectors:
  - name: pr_changed
    definition:
      union:
        - method: state
          value: modified
          children: true
        - method: tag
          value: always_ci

From PR Open to CI Run (The Slim CI Pattern)

When a PR is opened, CI fires and builds only the models changed on that branch and their downstream (+) dependencies. The trick is to use defer to redirect references to the existing stable version (usually the prod artifacts) so unchanged parts are not recomputed.

CI uses the same adapter configuration as production (Snowflake/Databricks, etc.) but routes output to a branch-specific temporary schema. On success it reports a status check back to the Git platform, where reviewers verify the tests passed and review the result summary.

  • state:modified+ runs the minimum set of changes and dependencies
  • defer delegates unchanged upstream references to the production artifacts
  • Per-PR temporary schema (encode the branch name or short hash in the naming rule)
  • Reflect critical dbt test failures in the status check (Quality Gate)
ApproachSetup difficultyPR integration / statusSlim CI (state/defer)
dbt Cloud CILow (UI-centric)Standard integration, easyBuilt-in
GitHub Actions + dbt CoreMedium (YAML configuration)Can integrate with ChecksSupported (manual configuration)
Databricks Jobs + dbtMedium-high (requires job design)PR integration requires extra workSupported (controlled via arguments)

CI internal flow (conceptual)

N/A

Slim CI implementation example on GitHub Actions

name: dbt CI (PR)
on:
  pull_request:
    branches: [ main ]
jobs:
  slim-ci:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - name: Install dbt-core + adapter
        run: |
          pip install dbt-core dbt-snowflake  # or dbt-databricks
      - name: Restore prod artifacts (state)
        uses: actions/download-artifact@v4
        with:
          name: prod_artifacts
          path: ./.state
      - name: dbt deps
        run: dbt deps
      - name: dbt build (Slim CI)
        env:
          DBT_USER: ${{ secrets.SNOWFLAKE_USER }}
          DBT_PASSWORD: ${{ secrets.SNOWFLAKE_PASSWORD }}
          DBT_ACCOUNT: ${{ secrets.SNOWFLAKE_ACCOUNT }}
          DBT_SCHEMA: pr_${{ github.event.number }}
        run: |
          dbt build --select state:modified+ \
                    --defer --state ./.state \
                    --target ci
      - name: Upload CI artifacts
        uses: actions/upload-artifact@v4
        with:
          name: ci_artifacts
          path: target/*.json

Post-Merge CD and Production Promotion

Once a PR merges to main, the production job kicks off. CI used defer to reference the existing prod artifacts, but the production job writes to the real prod connection and schema. Combining schedule-driven and event-driven triggers is realistic in practice (event-driven for high-frequency models, schedule-driven for heavy aggregations).

Whether to cleanly separate run and test in production or fold them into a single build is a team-level decision. What matters is to avoid leaving partially successful states behind on failure, and to reliably persist artifacts so the next run can use them as state.

  • It is also valid to scope production runs to 'changes plus their dependencies' rather than full refresh
  • Automate docs generation and exposures updates at the end of the job
  • Persist artifacts (manifest, run_results) long-term and reuse them as state next time
  • Apply grants reliably via a dedicated macro or as a post-step
StepPurposeExample
seedLoad reference master dataIdempotently load a small CSV
snapshotHistory management (SCD2)Upsert using a change-detection key
build(run+test)Generate the artifacts and validateFail fast on critical test failures
docsMetadata refreshdbt docs generate / apply exposures

A simple CD execution order

N/A

Production job command example (with separated stages)

dbt deps
# 必要に応じて seed / snapshot
# dbt seed --full-refresh
# dbt snapshot
# 変更中心の実行(十分に安定運用ならフル実行も可)
dbt run --select state:modified+
dbt test --select state:modified+
# アーティファクト保存(ワークフロー側で target/*.json を保管)

Database-Side Isolation Strategy (Schema / Catalog Design)

CI safety hinges on schema (or catalog) isolation. Create a temporary schema per PR and clean it up when CI completes. On Snowflake, split ROLE/WAREHOUSE/SCHEMA; on Databricks Unity Catalog, cleanly separate catalog.schema.

Encode the branch or PR number into the schema name to avoid collisions and invalid characters. Overriding dbt's generate_schema_name macro lets you make safe naming a team-wide rule.

  • Snowflake: grant a least-privilege ROLE only the rights to create and write to a temp schema
  • Databricks: isolate a development catalog and disallow writes to the prod catalog
  • Cleanup: drop old PR schemas by TTL via a scheduled job
  • Naming: keep it concise, e.g. ci_pr123_userx or ci_b_<short_sha>
PlatformIsolation unitCaveats
SnowflakeDatabase / Schema / Role / WarehouseRestrict DROP/CREATE on the prod role; give CI a dedicated warehouse
Databricks (Unity Catalog)Catalog / Schema / PermissionsMake the prod catalog read-only; CI writes only to the dev catalog
Others (Postgres, BigQuery, etc.)Schema / DatasetAutomate cost/quota controls and cleanup

Isolation diagram (conceptual)

N/A

Example schema-naming macro (sanitizes the branch name)

{% macro generate_schema_name(custom_schema_name, node) -%}
  {%- set raw = env_var('GIT_BRANCH', 'pr') | lower -%}
  {%- set safe = re.sub('[^a-z0-9_]', '_', raw) -%}
  {%- set prefix = 'ci_' ~ safe[0:20] -%}
  {{ return(prefix) }}
{%- endmacro %}

Tests, Quality Gates, and Artifact Management

A quality gate is the mechanism that blocks a PR on critical dbt test failures. General tests like unique/not_null/accepted_values work well in CI; reserve heavy checks (large row counts, statistical comparisons) for staging to strike a good balance.

Always persist artifacts (manifest.json / run_results.json) and reference them as the state for the next CI run. This is what makes Slim CI possible and lets you skip recomputing unchanged areas. Update documentation (catalog.json) and exposures at the same time to keep visibility intact for downstream consumers.

  • Run lightweight tests in CI; route heavy / regression tests to stg or prod
  • Store run_results.json alongside the status summary
  • Use manifest.json as the --state for CI (long-term retention recommended)
  • Make CI success mandatory via branch protection (Required status checks)
Test categoryWhen to runOn failure
Generic (unique / not_null)CI (PR)Block the PR
Referential / functionalCI or stgBlock on critical issues; warning is acceptable for minor ones
Regression / statistical comparisonstg/prodOperational call, subject to review

Artifact flow (conceptual)

N/A

schema.yml test example (excerpt)

version: 2
models:
  - name: fct_orders
    tests:
      - unique:
          column_name: order_id
      - not_null:
          column_name: order_id
      - relationships:
          to: ref('dim_customer')
          field: customer_id

Rollback on Failure and Operational Essentials

When production fails, first revert the most recent commit and re-run. If you are worried about half-baked state from a failed table update, a blue/green pattern (build into a next schema, then swap views) is an effective way to swap the visible layer safely.

Separate permissions and destructive changes, and confine drops and full-refresh to a maintenance window. Run snapshots in production only, and normally skip them in CI.

  • Rollback basics: git revert plus re-running the production job
  • Swap the visible layer: use create or replace view to switch current to next instantly
  • Apply grants reliably via a dedicated macro (decoupled from data regeneration)
  • Apply destructive changes deliberately (use contracts to guarantee forward compatibility)
ScenarioDetection / signalResponse
Critical test failureCI status redPush a fix commit or close the PR
Partial failure in the production jobFailed nodes shown in run_resultsRetry, or revert then re-run
Schema compatibility brokencontracts / downstream errorsBuy time with a view swap, then ship a backward-compatible fix

Blue/green swap (conceptual)

N/A

Simple view-swap script example (SQL)

-- next スキーマで build 済みを想定
-- 可視スキーマのビューを next へ切替
create or replace view prod.visible.fct_orders as
select * from prod_next.fct_orders;

Check Your Understanding

Analytics Engineer

問題 1

You are designing a PR-based Slim CI. You want unchanged upstream models to reference the stable production versions while validating only the changed models and their dependencies. Which dbt command is most appropriate?

  1. dbt build --select state:modified+ --defer --state path/to/prod_artifacts
  2. dbt run --full-refresh
  3. dbt test --select tag:ci_only --no-defer
  4. dbt build --select +state:modified --no-write-json

正解: A

The canonical Slim CI pattern targets the diff and its downstream with state:modified+ and uses defer to delegate references to unchanged dependencies to the existing (typically prod) artifacts. --state specifies the location of the manifest.json (and friends) to reference.

Frequently Asked Questions

Where should I store the state artifacts used by Slim CI?

Save the manifest.json and run_results.json produced under the production job's target directory to durable storage when the job finishes, then download them at CI time and reference them with --state. dbt Cloud makes it easy to reference artifacts across environments; with GitHub Actions and similar tools, use upload/download-artifact or external storage such as S3, GCS, or DBFS.

How should I design permissions so CI runs safely on Snowflake or Databricks?

Provision a dedicated CI role (or service principal) and limit write targets to development/CI schemas (or catalogs). Disallow CREATE/DROP on production schemas and grant only SELECT. Isolate warehouses and clusters for CI as well to keep cost and blast radius under control.

Should CI run snapshots or heavy seeds?

Generally, no. CI exists to give immediate feedback on the validity and compatibility of changes; snapshots and large seeds belong in production or staging batch jobs. If you must run them in CI, swap in a lightweight sample or use CI-specific tags to exclude or scope them.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.