dbt Artifacts: manifest, run_results, sources, catalog (2026)

dbt produces JSON artifacts under target/ on every run. Among them, manifest.json, run_results.json, and catalog.json are the heart of observability, documentation, and CI integration.

This article follows the dbt docs (https://docs.getdbt.com/) and consolidates the angles most often asked on the exam together with field-tested operational know-how. Where details depend heavily on the version, we add caveats and prioritize stable interpretations.

Overview: When, What, and Why They Are Generated

After a command runs, dbt writes multiple JSON files under target/. manifest.json holds the execution plan and lineage, run_results.json holds the execution results, and catalog.json holds table/column types and statistics. Together they serve as the base data for visualization (dbt docs), quality monitoring (CI, SLA validation), and data catalog integration.

Three core ideas to remember: 1) manifest is the graph (dependencies and node definitions), 2) run_results is each node's execution status and timing, 3) catalog is the schema information of the target environment. All three are JSON and typically carry generated_at and metadata.

Output goes to target/ by default (a separate concept from the target name in profiles.yml)
docs generate re-emits both catalog.json and manifest.json
In CI, run_results.json drives pass/fail decisions and reports

Artifact	Example command that generates it	Main contents	Typical use
manifest.json	dbt parse / dbt run / dbt docs generate	Node definitions, dependencies, lineage	Impact analysis, documentation, selector validation
run_results.json	Right after dbt run / test / build	Per-node execution result, duration, errors	CI pass/fail, SLA monitoring, flake detection
catalog.json	dbt docs generate	Types and metadata for tables/views/columns	Data dictionary, type auditing, BI linking

Artifact generation and downstream usage flow (conceptual)

Minimal sanity check that artifacts are produced (shell)

dbt run --select my_model
ls -1 target | grep -E 'manifest|run_results|catalog'
# Refresh docs metadata
dbt docs generate

manifest.json: The Source of Truth for Models and Lineage

manifest.json represents the project graph. It contains each node's metadata (model, seed, snapshot, test, exposure, source, macro, etc.), file path, materialization, depends_on, and child_map/parent_map. On the exam, tying it to impact analysis and selector evaluation makes it a reliable scoring topic.

Fields change across versions, but nodes, sources, child_map/parent_map, generated_at, and metadata appear consistently. Adapter and package differences are usually captured in metadata, which is useful for portability assessments.

depends_on.nodes points upstream; child_map/parent_map is the project-wide dependency index
description, config.materialized, and database/schema/alias map directly to docs and naming conventions
exposures let you trace lineage all the way to BI dashboards and downstream consumers

Field (example)	Type / sample value	Use	Stability (rough)
nodes	dict	Node definitions for models, tests, etc.	High
sources	dict	source definitions and columns	High
child_map / parent_map	dict[str, list[str]]	Speeds up project-wide lineage traversal	Medium-High
generated_at	ISO 8601	Tracks generation time	High
metadata.adapter_type	str	Type of execution backend (e.g., snowflake, bigquery)	Medium

Direction of dependencies (upstream → downstream)

Enumerate the blast radius from manifest.json (Python)

import json
from collections import deque

with open('target/manifest.json') as f:
    mf = json.load(f)

child_map = mf.get('child_map', {})
start = 'model.my_project.fct_orders'

visited, q = set([start]), deque([start])

while q:
    n = q.popleft()
    for c in child_map.get(n, []):
        if c not in visited:
            visited.add(c)
            q.append(c)

print('\n'.join(sorted(visited)))  # Downstream impacted nodes of fct_orders

run_results.json: The Single Source of Truth for Run Status and Performance

run_results.json stores results for the most recent command (run/test/build, etc.). Each result includes status (success, error, skipped, etc.), execution_time, timing (compile/execute breakdown), and failures (e.g., number of failing tests). CI uses this for automated decisions, and SLA monitoring uses it to detect slowdowns or flakiness.

Important caveat: run_results is a snapshot of the most recent run. To analyze history, you need to collect and archive it every run (copy it to object storage or a DWH).

invocation_id is essential as a correlation key for each invocation
Aggregate execution_time per model/test to detect performance regressions
Results with status=error may include message and adapter_response fields

Field (example)	Type / sample value	Use	Notes
results[]	list	Per-node execution result	status, timing, execution_time
elapsed_time	float seconds	Overall wall-clock duration	Wall-clock time
generated_at	ISO 8601	Generation time	Used for auditing
args	dict	Invocation arguments	Selectors, etc.
metadata	dict	Environment / adapter info	Supports reproducibility

Pass/fail decision flow in CI

Summarize run_results.json (Python)

import json, sys
from collections import Counter

with open('target/run_results.json') as f:
    rr = json.load(f)

status = Counter(r.get('status', 'unknown') for r in rr.get('results', []))
mean_time = sum(r.get('execution_time', 0.0) for r in rr.get('results', [])) / max(1, len(rr.get('results', [])))

print('status summary:', dict(status))
print('mean execution_time(sec):', round(mean_time, 2))

errors = [r for r in rr.get('results', []) if r.get('status') == 'error']
if errors:
    print('ERROR DETAILS:')
    for e in errors:
        node = e.get('unique_id')
        msg = e.get('message', '')
        print(f'- {node}: {msg[:200]}')
    sys.exit(1)

catalog.json: Strengthen Documentation with Types and Column Statistics

catalog.json is produced by dbt docs generate and contains per-column types and comments for the actual database entities (models, sources), and sometimes statistics such as row count estimates. Joining it with manifest lets you detect differences between logical definitions and physical implementations.

data_type names and the granularity of statistics vary by adapter. For the exam, lock in two facts: catalog is generated by docs generate, and it provides column information and types.

Column-level documentation maintenance and type-drift detection (e.g., expected INT but actually STRING)
Use it as the source for syncing into BI tools and data catalogs
Joinable on unique_id between manifest.nodes[unique_id] and catalog.nodes[unique_id]

Element	Example	How to use it	Caveats
nodes[unique_id].columns	order_id: {data_type: NUMBER}	Document column types	Type names depend on the adapter
metadata.adapter_type	snowflake, bigquery, redshift, etc.	Understand per-backend differences	Useful for multi-cloud comparison
generated_at	ISO 8601	Makes the snapshot timestamp explicit	Useful for detecting staleness

Conceptual join between manifest and catalog

manifest.nodes[model.fct_orders] ----join on unique_id---- catalog.nodes[model.fct_orders]
         | logical config                                 | physical columns/types

Join catalog and manifest to detect type drift (Python)

import json

with open('target/manifest.json') as f: mf = json.load(f)
with open('target/catalog.json') as f: cg = json.load(f)

m_nodes = mf.get('nodes', {})
c_nodes = cg.get('nodes', {})

for uid, m in m_nodes.items():
    if not uid.startswith('model.'):  # models only
        continue
    c = c_nodes.get(uid)
    if not c:
        continue
    m_cols = (m.get('columns') or {}).keys()
    c_cols = (c.get('columns') or {}).keys()
    missing = set(m_cols) - set(c_cols)
    if missing:
        print(f'[WARN] not in catalog: {uid} → {sorted(missing)}')

Real-World Patterns: CI/CD, Lineage Visualization, and Audit Integration

The basic principle for artifacts is "preserve and reuse." Collect run_results in CI, archive to S3/GCS, then load into a DWH to track trends in job stability and performance regressions. manifest lets you automate impact reviews by diffing per pull request. catalog is the natural trigger for updating your data dictionary.

The patterns are the same in dbt Cloud and OSS, but operational design for storage and lookup (naming, retention, metadata enrichment) makes or breaks the quality of the result.

CI: dbt build → collect run_results → on failure, comment with log excerpts
Lineage diff: compare child_map between old and new manifest
Audit: store generated_at, invocation_id, and adapter_type in an audit table

Use case	Artifact(s) used	Key point	Failure mode to avoid
Pass/fail decision (CI)	run_results.json	Aggregate status with thresholds	Latest only — always archive
Impact review	manifest.json	Visualize diffs via child_map/parent_map	Keep unique_id stable
Automated data dictionary updates	catalog.json + manifest.json	Cross-check physical types with logical definitions	Normalize adapter differences

Artifact archival pipeline in CI

GitHub Actions example (YAML excerpt)

jobs:
  dbt:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: { python-version: '3.11' }
      - run: pip install dbt-core dbt-bigquery
      - run: dbt deps && dbt build --fail-fast
      - name: Upload artifacts
        uses: actions/upload-artifact@v4
        with:
          name: dbt-target
          path: target/*.json

Exam Prep: Frequently Tested Angles and Pitfalls

On the Analytics Engineer exam, the generation timing and use-case separation of each artifact come up often. Be ready to answer instantly on three points: "catalog is from docs generate," "run_results covers the most recent run only," and "manifest is the source of lineage information."

Being able to articulate the stability of unique_id, the difference between depends_on and child_map, and the purpose of invocation_id will also earn extra points. It is safer to memorize the role separation between files than to memorize version-specific details.

manifest: the graph and node definitions (selectors, impact analysis)
run_results: snapshot of execution results (CI/SLA)
catalog: physical schema and statistics (documentation/audit)

Question pattern	Key to the correct answer	Common wrong answer	Quick tip
Generation timing	docs generate yields catalog	Mistaking run as generating catalog	manifest is also updated by parse/run
Use-case separation	run_results is for pass/fail and timing	Confusing it with manifest	Argue from status and execution_time
Lineage traversal	child_map/parent_map	Using only the one-way depends_on	Use both directions appropriately

Mini mnemonic map

manifest = the graph
run_results = success/failure & time
catalog = types/columns
→ Memorize by role, then confirm via generation timing

From unique_id to a human-readable name (Python)

import json
mf = json.load(open('target/manifest.json'))
uid = 'model.my_project.fct_orders'
node = mf['nodes'][uid]
print(node.get('name'), node.get('original_file_path'))

Check Yourself with a Question

Analytics Engineer

問題 1

In a dbt project, you want to automatically check whether table column types match expectations and update the documentation when they differ. Which combination of artifacts is most appropriate?

A. Cross-check manifest.json with catalog.json
B. Parse status and execution_time from run_results.json
C. Cross-check manifest.json with run_results.json
D. Refer to catalog.json only

正解: A

Actual column types live in catalog.json, while logical definitions, naming, and descriptions live in manifest.json. Joining them on unique_id lets you detect type and column drift while feeding the result into documentation updates. run_results.json contains execution results and is not suitable for type comparison. catalog alone cannot determine differences against the logical definition.

Frequently Asked Questions

Is catalog.json updated without running docs generate?

No. catalog.json is generated and updated by dbt docs generate (which inspects the database internally). It is not updated by dbt run alone.

How long should I retain run_results.json?

It depends on your operational needs, but if you do SLA monitoring or regression detection, it is common to archive each run to external storage and retain at least several weeks to months of history. run_results is a snapshot of the most recent run and gets overwritten.

What is the difference between depends_on and child_map in manifest.json?

depends_on is a list inside each node definition of the upstream nodes that node depends on. child_map is a project-wide index of downstream nodes, letting you instantly look up the blast radius of a given node.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Understanding dbt Artifacts: manifest.json / run_results.json / catalog.json