dbt produces JSON artifacts under target/ on every run. Among them, manifest.json, run_results.json, and catalog.json are the heart of observability, documentation, and CI integration.
This article follows the dbt docs (https://docs.getdbt.com/) and consolidates the angles most often asked on the exam together with field-tested operational know-how. Where details depend heavily on the version, we add caveats and prioritize stable interpretations.
After a command runs, dbt writes multiple JSON files under target/. manifest.json holds the execution plan and lineage, run_results.json holds the execution results, and catalog.json holds table/column types and statistics. Together they serve as the base data for visualization (dbt docs), quality monitoring (CI, SLA validation), and data catalog integration.
Three core ideas to remember: 1) manifest is the graph (dependencies and node definitions), 2) run_results is each node's execution status and timing, 3) catalog is the schema information of the target environment. All three are JSON and typically carry generated_at and metadata.
| Artifact | Example command that generates it | Main contents | Typical use |
|---|---|---|---|
| manifest.json | dbt parse / dbt run / dbt docs generate | Node definitions, dependencies, lineage | Impact analysis, documentation, selector validation |
| run_results.json | Right after dbt run / test / build | Per-node execution result, duration, errors | CI pass/fail, SLA monitoring, flake detection |
| catalog.json | dbt docs generate | Types and metadata for tables/views/columns | Data dictionary, type auditing, BI linking |
Artifact generation and downstream usage flow (conceptual)
Minimal sanity check that artifacts are produced (shell)
dbt run --select my_model
ls -1 target | grep -E 'manifest|run_results|catalog'
# Refresh docs metadata
dbt docs generatemanifest.json represents the project graph. It contains each node's metadata (model, seed, snapshot, test, exposure, source, macro, etc.), file path, materialization, depends_on, and child_map/parent_map. On the exam, tying it to impact analysis and selector evaluation makes it a reliable scoring topic.
Fields change across versions, but nodes, sources, child_map/parent_map, generated_at, and metadata appear consistently. Adapter and package differences are usually captured in metadata, which is useful for portability assessments.
| Field (example) | Type / sample value | Use | Stability (rough) |
|---|---|---|---|
| nodes | dict | Node definitions for models, tests, etc. | High |
| sources | dict | source definitions and columns | High |
| child_map / parent_map | dict[str, list[str]] | Speeds up project-wide lineage traversal | Medium-High |
| generated_at | ISO 8601 | Tracks generation time | High |
| metadata.adapter_type | str | Type of execution backend (e.g., snowflake, bigquery) | Medium |
Direction of dependencies (upstream → downstream)
Enumerate the blast radius from manifest.json (Python)
import json
from collections import deque
with open('target/manifest.json') as f:
mf = json.load(f)
child_map = mf.get('child_map', {})
start = 'model.my_project.fct_orders'
visited, q = set([start]), deque([start])
while q:
n = q.popleft()
for c in child_map.get(n, []):
if c not in visited:
visited.add(c)
q.append(c)
print('\n'.join(sorted(visited))) # Downstream impacted nodes of fct_ordersrun_results.json stores results for the most recent command (run/test/build, etc.). Each result includes status (success, error, skipped, etc.), execution_time, timing (compile/execute breakdown), and failures (e.g., number of failing tests). CI uses this for automated decisions, and SLA monitoring uses it to detect slowdowns or flakiness.
Important caveat: run_results is a snapshot of the most recent run. To analyze history, you need to collect and archive it every run (copy it to object storage or a DWH).
| Field (example) | Type / sample value | Use | Notes |
|---|---|---|---|
| results[] | list | Per-node execution result | status, timing, execution_time |
| elapsed_time | float seconds | Overall wall-clock duration | Wall-clock time |
| generated_at | ISO 8601 | Generation time | Used for auditing |
| args | dict | Invocation arguments | Selectors, etc. |
| metadata | dict | Environment / adapter info | Supports reproducibility |
Pass/fail decision flow in CI
Summarize run_results.json (Python)
import json, sys
from collections import Counter
with open('target/run_results.json') as f:
rr = json.load(f)
status = Counter(r.get('status', 'unknown') for r in rr.get('results', []))
mean_time = sum(r.get('execution_time', 0.0) for r in rr.get('results', [])) / max(1, len(rr.get('results', [])))
print('status summary:', dict(status))
print('mean execution_time(sec):', round(mean_time, 2))
errors = [r for r in rr.get('results', []) if r.get('status') == 'error']
if errors:
print('ERROR DETAILS:')
for e in errors:
node = e.get('unique_id')
msg = e.get('message', '')
print(f'- {node}: {msg[:200]}')
sys.exit(1)catalog.json is produced by dbt docs generate and contains per-column types and comments for the actual database entities (models, sources), and sometimes statistics such as row count estimates. Joining it with manifest lets you detect differences between logical definitions and physical implementations.
data_type names and the granularity of statistics vary by adapter. For the exam, lock in two facts: catalog is generated by docs generate, and it provides column information and types.
| Element | Example | How to use it | Caveats |
|---|---|---|---|
| nodes[unique_id].columns | order_id: {data_type: NUMBER} | Document column types | Type names depend on the adapter |
| metadata.adapter_type | snowflake, bigquery, redshift, etc. | Understand per-backend differences | Useful for multi-cloud comparison |
| generated_at | ISO 8601 | Makes the snapshot timestamp explicit | Useful for detecting staleness |
Conceptual join between manifest and catalog
manifest.nodes[model.fct_orders] ----join on unique_id---- catalog.nodes[model.fct_orders]
| logical config | physical columns/typesJoin catalog and manifest to detect type drift (Python)
import json
with open('target/manifest.json') as f: mf = json.load(f)
with open('target/catalog.json') as f: cg = json.load(f)
m_nodes = mf.get('nodes', {})
c_nodes = cg.get('nodes', {})
for uid, m in m_nodes.items():
if not uid.startswith('model.'): # models only
continue
c = c_nodes.get(uid)
if not c:
continue
m_cols = (m.get('columns') or {}).keys()
c_cols = (c.get('columns') or {}).keys()
missing = set(m_cols) - set(c_cols)
if missing:
print(f'[WARN] not in catalog: {uid} → {sorted(missing)}')The basic principle for artifacts is "preserve and reuse." Collect run_results in CI, archive to S3/GCS, then load into a DWH to track trends in job stability and performance regressions. manifest lets you automate impact reviews by diffing per pull request. catalog is the natural trigger for updating your data dictionary.
The patterns are the same in dbt Cloud and OSS, but operational design for storage and lookup (naming, retention, metadata enrichment) makes or breaks the quality of the result.
| Use case | Artifact(s) used | Key point | Failure mode to avoid |
|---|---|---|---|
| Pass/fail decision (CI) | run_results.json | Aggregate status with thresholds | Latest only — always archive |
| Impact review | manifest.json | Visualize diffs via child_map/parent_map | Keep unique_id stable |
| Automated data dictionary updates | catalog.json + manifest.json | Cross-check physical types with logical definitions | Normalize adapter differences |
Artifact archival pipeline in CI
GitHub Actions example (YAML excerpt)
jobs:
dbt:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: { python-version: '3.11' }
- run: pip install dbt-core dbt-bigquery
- run: dbt deps && dbt build --fail-fast
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: dbt-target
path: target/*.jsonOn the Analytics Engineer exam, the generation timing and use-case separation of each artifact come up often. Be ready to answer instantly on three points: "catalog is from docs generate," "run_results covers the most recent run only," and "manifest is the source of lineage information."
Being able to articulate the stability of unique_id, the difference between depends_on and child_map, and the purpose of invocation_id will also earn extra points. It is safer to memorize the role separation between files than to memorize version-specific details.
| Question pattern | Key to the correct answer | Common wrong answer | Quick tip |
|---|---|---|---|
| Generation timing | docs generate yields catalog | Mistaking run as generating catalog | manifest is also updated by parse/run |
| Use-case separation | run_results is for pass/fail and timing | Confusing it with manifest | Argue from status and execution_time |
| Lineage traversal | child_map/parent_map | Using only the one-way depends_on | Use both directions appropriately |
Mini mnemonic map
manifest = the graph
run_results = success/failure & time
catalog = types/columns
→ Memorize by role, then confirm via generation timingFrom unique_id to a human-readable name (Python)
import json
mf = json.load(open('target/manifest.json'))
uid = 'model.my_project.fct_orders'
node = mf['nodes'][uid]
print(node.get('name'), node.get('original_file_path'))Analytics Engineer
問題 1
In a dbt project, you want to automatically check whether table column types match expectations and update the documentation when they differ. Which combination of artifacts is most appropriate?
正解: A
Actual column types live in catalog.json, while logical definitions, naming, and descriptions live in manifest.json. Joining them on unique_id lets you detect type and column drift while feeding the result into documentation updates. run_results.json contains execution results and is not suitable for type comparison. catalog alone cannot determine differences against the logical definition.
Is catalog.json updated without running docs generate?
No. catalog.json is generated and updated by dbt docs generate (which inspects the database internally). It is not updated by dbt run alone.
How long should I retain run_results.json?
It depends on your operational needs, but if you do SLA monitoring or regression detection, it is common to archive each run to external storage and retain at least several weeks to months of history. run_results is a snapshot of the most recent run and gets overwritten.
What is the difference between depends_on and child_map in manifest.json?
depends_on is a list inside each node definition of the upstream nodes that node depends on. child_map is a project-wide index of downstream nodes, letting you instantly look up the blast radius of a given node.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...