dbt

Understanding dbt Artifacts: manifest.json / run_results.json / catalog.json

2026-04-19
NicheeLab Editorial Team

dbt produces JSON artifacts under target/ on every run. Among them, manifest.json, run_results.json, and catalog.json are the heart of observability, documentation, and CI integration.

This article follows the dbt docs (https://docs.getdbt.com/) and consolidates the angles most often asked on the exam together with field-tested operational know-how. Where details depend heavily on the version, we add caveats and prioritize stable interpretations.

Overview: When, What, and Why They Are Generated

After a command runs, dbt writes multiple JSON files under target/. manifest.json holds the execution plan and lineage, run_results.json holds the execution results, and catalog.json holds table/column types and statistics. Together they serve as the base data for visualization (dbt docs), quality monitoring (CI, SLA validation), and data catalog integration.

Three core ideas to remember: 1) manifest is the graph (dependencies and node definitions), 2) run_results is each node's execution status and timing, 3) catalog is the schema information of the target environment. All three are JSON and typically carry generated_at and metadata.

  • Output goes to target/ by default (a separate concept from the target name in profiles.yml)
  • docs generate re-emits both catalog.json and manifest.json
  • In CI, run_results.json drives pass/fail decisions and reports
ArtifactExample command that generates itMain contentsTypical use
manifest.jsondbt parse / dbt run / dbt docs generateNode definitions, dependencies, lineageImpact analysis, documentation, selector validation
run_results.jsonRight after dbt run / test / buildPer-node execution result, duration, errorsCI pass/fail, SLA monitoring, flake detection
catalog.jsondbt docs generateTypes and metadata for tables/views/columnsData dictionary, type auditing, BI linking

Artifact generation and downstream usage flow (conceptual)

dbt projecttarget/dbt run / test / docs generatemanifest.jsonBlast radius / lineage visualization & CIrun_results.jsonRun success/failure & SLA monitoringcatalog.jsonData dictionary / type integrationExternal integrationsBI, catalogs, dashboards, alertsArtifact generation and downstream usage flow (conceptual)

Minimal sanity check that artifacts are produced (shell)

dbt run --select my_model
ls -1 target | grep -E 'manifest|run_results|catalog'
# Refresh docs metadata
dbt docs generate

manifest.json: The Source of Truth for Models and Lineage

manifest.json represents the project graph. It contains each node's metadata (model, seed, snapshot, test, exposure, source, macro, etc.), file path, materialization, depends_on, and child_map/parent_map. On the exam, tying it to impact analysis and selector evaluation makes it a reliable scoring topic.

Fields change across versions, but nodes, sources, child_map/parent_map, generated_at, and metadata appear consistently. Adapter and package differences are usually captured in metadata, which is useful for portability assessments.

  • depends_on.nodes points upstream; child_map/parent_map is the project-wide dependency index
  • description, config.materialized, and database/schema/alias map directly to docs and naming conventions
  • exposures let you trace lineage all the way to BI dashboards and downstream consumers
Field (example)Type / sample valueUseStability (rough)
nodesdictNode definitions for models, tests, etc.High
sourcesdictsource definitions and columnsHigh
child_map / parent_mapdict[str, list[str]]Speeds up project-wide lineage traversalMedium-High
generated_atISO 8601Tracks generation timeHigh
metadata.adapter_typestrType of execution backend (e.g., snowflake, bigquery)Medium

Direction of dependencies (upstream → downstream)

source.raw_ordersmodel.stg_ordersmodel.fct_ordersexposure.orders_dashboardDirection of dependencies (upstream → downstream)

Enumerate the blast radius from manifest.json (Python)

import json
from collections import deque

with open('target/manifest.json') as f:
    mf = json.load(f)

child_map = mf.get('child_map', {})
start = 'model.my_project.fct_orders'

visited, q = set([start]), deque([start])

while q:
    n = q.popleft()
    for c in child_map.get(n, []):
        if c not in visited:
            visited.add(c)
            q.append(c)

print('\n'.join(sorted(visited)))  # Downstream impacted nodes of fct_orders

run_results.json: The Single Source of Truth for Run Status and Performance

run_results.json stores results for the most recent command (run/test/build, etc.). Each result includes status (success, error, skipped, etc.), execution_time, timing (compile/execute breakdown), and failures (e.g., number of failing tests). CI uses this for automated decisions, and SLA monitoring uses it to detect slowdowns or flakiness.

Important caveat: run_results is a snapshot of the most recent run. To analyze history, you need to collect and archive it every run (copy it to object storage or a DWH).

  • invocation_id is essential as a correlation key for each invocation
  • Aggregate execution_time per model/test to detect performance regressions
  • Results with status=error may include message and adapter_response fields
Field (example)Type / sample valueUseNotes
results[]listPer-node execution resultstatus, timing, execution_time
elapsed_timefloat secondsOverall wall-clock durationWall-clock time
generated_atISO 8601Generation timeUsed for auditing
argsdictInvocation argumentsSelectors, etc.
metadatadictEnvironment / adapter infoSupports reproducibility

Pass/fail decision flow in CI

dbt run/testtarget/run_resultsExtract failuresReport / alertPass/fail decision flow in CI

Summarize run_results.json (Python)

import json, sys
from collections import Counter

with open('target/run_results.json') as f:
    rr = json.load(f)

status = Counter(r.get('status', 'unknown') for r in rr.get('results', []))
mean_time = sum(r.get('execution_time', 0.0) for r in rr.get('results', [])) / max(1, len(rr.get('results', [])))

print('status summary:', dict(status))
print('mean execution_time(sec):', round(mean_time, 2))

errors = [r for r in rr.get('results', []) if r.get('status') == 'error']
if errors:
    print('ERROR DETAILS:')
    for e in errors:
        node = e.get('unique_id')
        msg = e.get('message', '')
        print(f'- {node}: {msg[:200]}')
    sys.exit(1)

catalog.json: Strengthen Documentation with Types and Column Statistics

catalog.json is produced by dbt docs generate and contains per-column types and comments for the actual database entities (models, sources), and sometimes statistics such as row count estimates. Joining it with manifest lets you detect differences between logical definitions and physical implementations.

data_type names and the granularity of statistics vary by adapter. For the exam, lock in two facts: catalog is generated by docs generate, and it provides column information and types.

  • Column-level documentation maintenance and type-drift detection (e.g., expected INT but actually STRING)
  • Use it as the source for syncing into BI tools and data catalogs
  • Joinable on unique_id between manifest.nodes[unique_id] and catalog.nodes[unique_id]
ElementExampleHow to use itCaveats
nodes[unique_id].columnsorder_id: {data_type: NUMBER}Document column typesType names depend on the adapter
metadata.adapter_typesnowflake, bigquery, redshift, etc.Understand per-backend differencesUseful for multi-cloud comparison
generated_atISO 8601Makes the snapshot timestamp explicitUseful for detecting staleness

Conceptual join between manifest and catalog

manifest.nodes[model.fct_orders] ----join on unique_id---- catalog.nodes[model.fct_orders]
         | logical config                                 | physical columns/types

Join catalog and manifest to detect type drift (Python)

import json

with open('target/manifest.json') as f: mf = json.load(f)
with open('target/catalog.json') as f: cg = json.load(f)

m_nodes = mf.get('nodes', {})
c_nodes = cg.get('nodes', {})

for uid, m in m_nodes.items():
    if not uid.startswith('model.'):  # models only
        continue
    c = c_nodes.get(uid)
    if not c:
        continue
    m_cols = (m.get('columns') or {}).keys()
    c_cols = (c.get('columns') or {}).keys()
    missing = set(m_cols) - set(c_cols)
    if missing:
        print(f'[WARN] not in catalog: {uid} → {sorted(missing)}')

Real-World Patterns: CI/CD, Lineage Visualization, and Audit Integration

The basic principle for artifacts is "preserve and reuse." Collect run_results in CI, archive to S3/GCS, then load into a DWH to track trends in job stability and performance regressions. manifest lets you automate impact reviews by diffing per pull request. catalog is the natural trigger for updating your data dictionary.

The patterns are the same in dbt Cloud and OSS, but operational design for storage and lookup (naming, retention, metadata enrichment) makes or breaks the quality of the result.

  • CI: dbt build → collect run_results → on failure, comment with log excerpts
  • Lineage diff: compare child_map between old and new manifest
  • Audit: store generated_at, invocation_id, and adapter_type in an audit table
Use caseArtifact(s) usedKey pointFailure mode to avoid
Pass/fail decision (CI)run_results.jsonAggregate status with thresholdsLatest only — always archive
Impact reviewmanifest.jsonVisualize diffs via child_map/parent_mapKeep unique_id stable
Automated data dictionary updatescatalog.json + manifest.jsonCross-check physical types with logical definitionsNormalize adapter differences

Artifact archival pipeline in CI

Git Pushdbt buildtarget/*.Store (S3)DWH/BI monitoringloadArtifact archival pipeline in CI

GitHub Actions example (YAML excerpt)

jobs:
  dbt:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.11' }
      - run: pip install dbt-core dbt-bigquery
      - run: dbt deps && dbt build --fail-fast
      - name: Upload artifacts
        uses: actions/upload-artifact@v4
        with:
          name: dbt-target
          path: target/*.json

Exam Prep: Frequently Tested Angles and Pitfalls

On the Analytics Engineer exam, the generation timing and use-case separation of each artifact come up often. Be ready to answer instantly on three points: "catalog is from docs generate," "run_results covers the most recent run only," and "manifest is the source of lineage information."

Being able to articulate the stability of unique_id, the difference between depends_on and child_map, and the purpose of invocation_id will also earn extra points. It is safer to memorize the role separation between files than to memorize version-specific details.

  • manifest: the graph and node definitions (selectors, impact analysis)
  • run_results: snapshot of execution results (CI/SLA)
  • catalog: physical schema and statistics (documentation/audit)
Question patternKey to the correct answerCommon wrong answerQuick tip
Generation timingdocs generate yields catalogMistaking run as generating catalogmanifest is also updated by parse/run
Use-case separationrun_results is for pass/fail and timingConfusing it with manifestArgue from status and execution_time
Lineage traversalchild_map/parent_mapUsing only the one-way depends_onUse both directions appropriately

Mini mnemonic map

manifest = the graph
run_results = success/failure & time
catalog = types/columns
→ Memorize by role, then confirm via generation timing

From unique_id to a human-readable name (Python)

import json
mf = json.load(open('target/manifest.json'))
uid = 'model.my_project.fct_orders'
node = mf['nodes'][uid]
print(node.get('name'), node.get('original_file_path'))

Check Yourself with a Question

Analytics Engineer

問題 1

In a dbt project, you want to automatically check whether table column types match expectations and update the documentation when they differ. Which combination of artifacts is most appropriate?

  1. A. Cross-check manifest.json with catalog.json
  2. B. Parse status and execution_time from run_results.json
  3. C. Cross-check manifest.json with run_results.json
  4. D. Refer to catalog.json only

正解: A

Actual column types live in catalog.json, while logical definitions, naming, and descriptions live in manifest.json. Joining them on unique_id lets you detect type and column drift while feeding the result into documentation updates. run_results.json contains execution results and is not suitable for type comparison. catalog alone cannot determine differences against the logical definition.

Frequently Asked Questions

Is catalog.json updated without running docs generate?

No. catalog.json is generated and updated by dbt docs generate (which inspects the database internally). It is not updated by dbt run alone.

How long should I retain run_results.json?

It depends on your operational needs, but if you do SLA monitoring or regression detection, it is common to archive each run to external storage and retain at least several weeks to months of history. run_results is a snapshot of the most recent run and gets overwritten.

What is the difference between depends_on and child_map in manifest.json?

depends_on is a list inside each node definition of the upstream nodes that node depends on. child_map is a project-wide index of downstream nodes, letting you instantly look up the blast radius of a given node.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.