manifest.json is the artifact that bundles together every piece of "static metadata" in a dbt project. Models, sources, tests, macros, and their dependencies all live in a single file. It is generated at target/manifest.json and refreshed by most commands that parse the project, including dbt compile, run, test, and docs generate.
This article focuses on the keys that stay stable across the official docs. We organize the inspection techniques you can use day-to-day alongside the angles the Analytics Engineer exam likes to test.
manifest.json is the result of dbt's static analysis of your project. It contains the definitions of every node (models, seeds, snapshots, analyses, tests, etc.) along with sources, macros, and exposures, plus all of their dependencies. It captures the inputs to execution, not the execution results themselves.
Generation timing is predictable: any command that runs the dbt parser refreshes target/manifest.json. In CI you can persist this file as an artifact and use it for diff review and lineage health checks.
How dbt and manifest.json relate (static metadata → execution)
manifest.json (excerpt — top-level shape)
{
"metadata": {
"dbt_version": "1.x.y",
"generated_at": "2026-04-18T10:00:00Z",
"project_id": "..."
},
"nodes": { "model.my_pkg.my_model": { "resource_type": "model", "name": "my_model", "config": { "materialized": "view" } } },
"sources": { "source.my_pkg.my_src.my_tbl": { "source_name": "my_src", "name": "my_tbl" } },
"macros": { "macro.my_pkg.some_macro": { "path": "macros/some.sql" } },
"exposures": { },
"parent_map": { "model.my_pkg.my_model": ["source.my_pkg.my_src.my_tbl"] },
"child_map": { "source.my_pkg.my_src.my_tbl": ["model.my_pkg.my_model"] }
}The exam loves to test whether you can tell these top-level buckets apart. nodes holds most of the "things to build" (models, seeds, snapshots, analyses, tests). sources is dedicated to data sources. macros holds Jinja macros. exposures represents downstream consumers like BI dashboards. parent_map and child_map make it easy to walk the dependency graph forward or backward.
metadata carries context such as version and generation time, which you use to verify consistency between artifacts.
| Key | Role | Typical use case |
|---|---|---|
| nodes | All "things to build" nodes such as models | Materialization audits, column metadata inventory, dependency checks |
| sources | External source definitions | Lineage tracing origin, source freshness checks |
| macros | Jinja macro definitions | Cross-project reuse tracking, impact analysis |
| exposures | External consumers (BI tools, APIs, etc.) | Showing dashboard upstream dependencies to stakeholders |
| parent_map | Node → parent nodes | Upstream tracing (where did this come from) |
| child_map | Node → child nodes | Downstream impact (where does this go) |
List top-level keys for a quick existence check (jq example)
jq -r 'keys[]' target/manifest.json
# Example output:
# metadata
# nodes
# sources
# macros
# exposures
# parent_map
# child_mapEach entry in nodes is keyed by a unique_id (e.g. model.pkg_name.model_name). resource_type takes values like model, seed, snapshot, test, and analysis. Most real-world checks and exam questions can be solved by understanding config, depends_on, relation_name (the post-compile physical name), original_file_path, fqn, columns, tags, and description.
config.materialized is by far the most frequently checked field. The distinction between view/table/incremental/ephemeral has a direct impact on cost and dependency resolution. The descriptions under columns and the tests bound to them are the foundation of documentation quality.
Model node excerpt (important fields)
{
"nodes": {
"model.jaffle_shop.orders": {
"resource_type": "model",
"name": "orders",
"package_name": "jaffle_shop",
"original_file_path": "models/orders.sql",
"fqn": ["jaffle_shop", "models", "orders"],
"database": "ANALYTICS",
"schema": "DBT_DEV",
"alias": null,
"relation_name": "ANALYTICS.DBT_DEV.ORDERS",
"config": { "materialized": "view", "tags": ["core"] },
"description": "Base orders fact model",
"columns": {
"order_id": {"name": "order_id", "description": "Primary key"},
"customer_id": {"name": "customer_id", "description": "Foreign key"}
},
"depends_on": { "nodes": ["source.jaffle_shop.raw.orders"] },
"checksum": {"name": "sha256", "checksum": "..."}
}
}
}Lineage starts with depends_on.nodes for understanding immediate upstreams. parent_map and child_map are best when you need to walk the full graph. Recursive upstream expansion is done by traversing parent_map; downstream impact analysis by traversing child_map.
Selector behavior follows the official spec, but cross-checking against manifest.json removes a lot of guesswork. For example, the +model_name expansion can be reproduced by recursively walking child_map.
Quick jq to list upstreams and downstreams (starting from a unique_id)
# Upstream (parents)
uid="model.jaffle_shop.orders"
jq --arg uid "$uid" -r '.parent_map[$uid][]?' target/manifest.json
# Downstream (children)
jq --arg uid "$uid" -r '.child_map[$uid][]?' target/manifest.json
# Direct dependencies (inside the node)
jq --arg uid "$uid" -r '.nodes[$uid].depends_on.nodes[]?' target/manifest.jsonsources is where dbt names and documents upstream database tables and views. In manifest.json they appear under unique_ids of the form source.<pkg>.<source_name>.<name>, exposing database/schema/name, loader, description, and so on.
Most tests are stored under nodes with resource_type: test, and references to the target node or column live in their depends_on. exposures carries the dashboard's owner, url, maturity, depends_on, and other static info, making it easy to communicate upstream dependencies to stakeholders.
Excerpts for source, test, and exposure
{
"sources": {
"source.jaffle_shop.raw.orders": {
"source_name": "raw",
"name": "orders",
"database": "RAW",
"schema": "JAFFLE",
"loader": "ingestion_tool",
"description": "Raw orders table"
}
},
"nodes": {
"test.jaffle_shop.not_null_orders_order_id": {
"resource_type": "test",
"name": "not_null",
"depends_on": {"nodes": ["model.jaffle_shop.orders"]}
}
},
"exposures": {
"exposure.jaffle_shop.orders_dashboard": {
"type": "dashboard",
"name": "orders_dashboard",
"owner": {"name": "BI Team", "email": "[email protected]"},
"url": "https://bi.example.com/dash/123",
"maturity": "high",
"depends_on": ["model.jaffle_shop.orders"]
}
}
}In production, it pays to collect manifest.json on a schedule and automate checks for materialization drift, undocumented columns, and unused models (zero downstreams). In CI, attaching the manifest diff to pull requests significantly speeds up review.
On the exam, expect recurring questions about: the distinction that manifest.json is "static metadata" while execution results live in run_results.json; what each of nodes/sources/macros/exposures owns; the roles of depends_on vs parent_map/child_map; and the impact of config.materialized. Avoid memorizing version-sensitive fields and lean on stable keys like resource_type, depends_on, config, and relation_name.
Inventory materialized for every model (jq)
jq -r '
.nodes
| to_entries[]
| select(.value.resource_type=="model")
| [.key, (.value.config.materialized // "<unset>")]
| @tsv
' target/manifest.json
# Output: pairs of unique_id and materializedAnalytics Engineer
問題 1
As an Analytics Engineer using manifest.json, you want to list every downstream node of a given model. Which combination of fields should you reference most directly?
正解: A
Listing downstreams is most directly done by recursively walking child_map[unique_id]. depends_on points upstream, and metadata/macros are not suited for downstream enumeration.
What is the difference between manifest.json and run_results.json?
manifest.json holds static metadata (node attributes and dependencies). run_results.json holds execution results (success/failure, run time, row counts, etc.). They serve different purposes, so do not conflate them.
Does manifest.json contain credentials or data values?
No. manifest.json only contains structure and metadata, not credentials or actual data. You should still handle it securely, but it is not a primary vector for sensitive data leakage.
How should I handle differences between dbt versions?
Check metadata.dbt_version and build your tooling around stable fields such as resource_type, depends_on, config, and relation_name. If you depend on schema-volatile areas, add schema validation in CI so you catch breakage early.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
dbt Models: SQL-Defined Transformation Units (2026)
Model fundamentals — SELECT-based definitions, naming, refs,...
dbt Analytics Engineering Exam: Complete Guide (2026)
Pass the AE Certification — scope, weighting, sample questio...
dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)
Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...
dbt Project Structure: models/seeds/macros Layout (2026)
Recommended dbt project layout — models, seeds, macros, snap...
dbt_project.yml Explained: Every Config (2026)
Every dbt_project.yml setting that matters — paths, vars, ma...