dbt manifest.json: Project Graph Reference (2026)

manifest.json is the artifact that bundles together every piece of "static metadata" in a dbt project. Models, sources, tests, macros, and their dependencies all live in a single file. It is generated at target/manifest.json and refreshed by most commands that parse the project, including dbt compile, run, test, and docs generate.

This article focuses on the keys that stay stable across the official docs. We organize the inspection techniques you can use day-to-day alongside the angles the Analytics Engineer exam likes to test.

Where manifest.json fits and what it contains

manifest.json is the result of dbt's static analysis of your project. It contains the definitions of every node (models, seeds, snapshots, analyses, tests, etc.) along with sources, macros, and exposures, plus all of their dependencies. It captures the inputs to execution, not the execution results themselves.

Generation timing is predictable: any command that runs the dbt parser refreshes target/manifest.json. In CI you can persist this file as an artifact and use it for diff review and lineage health checks.

Location: target/manifest.json
Main top-level keys: nodes, sources, macros, exposures, parent_map, child_map, metadata
Static metadata: success/failure and row counts are NOT included (those live in run_results.json)
Stable keys you can rely on: resource_type, name, original_file_path, fqn, config, description, columns, depends_on

How dbt and manifest.json relate (static metadata → execution)

manifest.json (excerpt — top-level shape)

{
  "metadata": {
    "dbt_version": "1.x.y",
    "generated_at": "2026-04-18T10:00:00Z",
    "project_id": "..."
  },
  "nodes": { "model.my_pkg.my_model": { "resource_type": "model", "name": "my_model", "config": { "materialized": "view" } } },
  "sources": { "source.my_pkg.my_src.my_tbl": { "source_name": "my_src", "name": "my_tbl" } },
  "macros": { "macro.my_pkg.some_macro": { "path": "macros/some.sql" } },
  "exposures": { },
  "parent_map": { "model.my_pkg.my_model": ["source.my_pkg.my_src.my_tbl"] },
  "child_map": { "source.my_pkg.my_src.my_tbl": ["model.my_pkg.my_model"] }
}

Top-level structure: what each key is for

The exam loves to test whether you can tell these top-level buckets apart. nodes holds most of the "things to build" (models, seeds, snapshots, analyses, tests). sources is dedicated to data sources. macros holds Jinja macros. exposures represents downstream consumers like BI dashboards. parent_map and child_map make it easy to walk the dependency graph forward or backward.

metadata carries context such as version and generation time, which you use to verify consistency between artifacts.

nodes: a mix of resource_types (model/test/seed/snapshot/analysis)
sources: source tables (database/schema/name; freshness data is in a separate artifact)
macros: definitions of callable macros (no execution results)
exposures: links between upstream dbt nodes and external surfaces
parent_map / child_map: bidirectional dependency lookups (handy for verifying selector behavior)
metadata: dbt_version, generated_at, project_id, etc.

Key	Role	Typical use case
nodes	All "things to build" nodes such as models	Materialization audits, column metadata inventory, dependency checks
sources	External source definitions	Lineage tracing origin, source freshness checks
macros	Jinja macro definitions	Cross-project reuse tracking, impact analysis
exposures	External consumers (BI tools, APIs, etc.)	Showing dashboard upstream dependencies to stakeholders
parent_map	Node → parent nodes	Upstream tracing (where did this come from)
child_map	Node → child nodes	Downstream impact (where does this go)

List top-level keys for a quick existence check (jq example)

jq -r 'keys[]' target/manifest.json
# Example output:
# metadata
# nodes
# sources
# macros
# exposures
# parent_map
# child_map

Deep dive into nodes: model attributes and stable keys

Each entry in nodes is keyed by a unique_id (e.g. model.pkg_name.model_name). resource_type takes values like model, seed, snapshot, test, and analysis. Most real-world checks and exam questions can be solved by understanding config, depends_on, relation_name (the post-compile physical name), original_file_path, fqn, columns, tags, and description.

config.materialized is by far the most frequently checked field. The distinction between view/table/incremental/ephemeral has a direct impact on cost and dependency resolution. The descriptions under columns and the tests bound to them are the foundation of documentation quality.

resource_type: node kind (model, seed, snapshot, test, etc.)
name / alias / relation_name: logical name, alias, physical name
database / schema: logical schema info for the target connection
config: materialized, tags, partitioning, etc. (be careful with adapter-specific fields)
depends_on.nodes: array of direct dependency nodes
original_file_path / package_name / path: provenance tracing

Model node excerpt (important fields)

{
  "nodes": {
    "model.jaffle_shop.orders": {
      "resource_type": "model",
      "name": "orders",
      "package_name": "jaffle_shop",
      "original_file_path": "models/orders.sql",
      "fqn": ["jaffle_shop", "models", "orders"],
      "database": "ANALYTICS",
      "schema": "DBT_DEV",
      "alias": null,
      "relation_name": "ANALYTICS.DBT_DEV.ORDERS",
      "config": { "materialized": "view", "tags": ["core"] },
      "description": "Base orders fact model",
      "columns": {
        "order_id": {"name": "order_id", "description": "Primary key"},
        "customer_id": {"name": "customer_id", "description": "Foreign key"}
      },
      "depends_on": { "nodes": ["source.jaffle_shop.raw.orders"] },
      "checksum": {"name": "sha256", "checksum": "..."}
    }
  }
}

Dependency graphs in practice: depends_on / parent_map / child_map

Lineage starts with depends_on.nodes for understanding immediate upstreams. parent_map and child_map are best when you need to walk the full graph. Recursive upstream expansion is done by traversing parent_map; downstream impact analysis by traversing child_map.

Selector behavior follows the official spec, but cross-checking against manifest.json removes a lot of guesswork. For example, the +model_name expansion can be reproduced by recursively walking child_map.

Upstream expansion: recursively walk parent_map[target]
Downstream expansion: recursively walk child_map[target]
Boundary handling: source.* is terminal (normally no further upstream)
test nodes: hang off the target model, but execution order is managed separately
Stable graph key: treat unique_id (e.g. model.pkg.name) as canonical

Quick jq to list upstreams and downstreams (starting from a unique_id)

# Upstream (parents)
uid="model.jaffle_shop.orders"
jq --arg uid "$uid" -r '.parent_map[$uid][]?' target/manifest.json

# Downstream (children)
jq --arg uid "$uid" -r '.child_map[$uid][]?' target/manifest.json

# Direct dependencies (inside the node)
jq --arg uid "$uid" -r '.nodes[$uid].depends_on.nodes[]?' target/manifest.json

How to read sources, tests, and exposures

sources is where dbt names and documents upstream database tables and views. In manifest.json they appear under unique_ids of the form source.<pkg>.<source_name>.<name>, exposing database/schema/name, loader, description, and so on.

Most tests are stored under nodes with resource_type: test, and references to the target node or column live in their depends_on. exposures carries the dashboard's owner, url, maturity, depends_on, and other static info, making it easy to communicate upstream dependencies to stakeholders.

sources: naming and documentation; freshness must be combined with a separate artifact
tests: unique_id contains test.*; the target model/column is identified via depends_on
exposures: enumeration of upstream dbt nodes plus metadata (owner, type, url, maturity)

Excerpts for source, test, and exposure

{
  "sources": {
    "source.jaffle_shop.raw.orders": {
      "source_name": "raw",
      "name": "orders",
      "database": "RAW",
      "schema": "JAFFLE",
      "loader": "ingestion_tool",
      "description": "Raw orders table"
    }
  },
  "nodes": {
    "test.jaffle_shop.not_null_orders_order_id": {
      "resource_type": "test",
      "name": "not_null",
      "depends_on": {"nodes": ["model.jaffle_shop.orders"]}
    }
  },
  "exposures": {
    "exposure.jaffle_shop.orders_dashboard": {
      "type": "dashboard",
      "name": "orders_dashboard",
      "owner": {"name": "BI Team", "email": "[email protected]"},
      "url": "https://bi.example.com/dash/123",
      "maturity": "high",
      "depends_on": ["model.jaffle_shop.orders"]
    }
  }
}

Operational checklist and exam tips

In production, it pays to collect manifest.json on a schedule and automate checks for materialization drift, undocumented columns, and unused models (zero downstreams). In CI, attaching the manifest diff to pull requests significantly speeds up review.

On the exam, expect recurring questions about: the distinction that manifest.json is "static metadata" while execution results live in run_results.json; what each of nodes/sources/macros/exposures owns; the roles of depends_on vs parent_map/child_map; and the impact of config.materialized. Avoid memorizing version-sensitive fields and lean on stable keys like resource_type, depends_on, config, and relation_name.

Bulk inspect materialized: catch missing view conversions and runaway table creation
Models with zero downstreams: cleanup candidates (child_map[id] is empty)
Undocumented columns: track the share where columns[].description is empty
Execution results live in run_results.json; source freshness numbers come from a separate artifact
Manage version differences explicitly via metadata.dbt_version

Inventory materialized for every model (jq)

jq -r '
  .nodes
  | to_entries[]
  | select(.value.resource_type=="model")
  | [.key, (.value.config.materialized // "<unset>")]
  | @tsv
' target/manifest.json
# Output: pairs of unique_id and materialized

Check your understanding

Analytics Engineer

問題 1

As an Analytics Engineer using manifest.json, you want to list every downstream node of a given model. Which combination of fields should you reference most directly?

child_map and the target model's unique_id
nodes[].depends_on.macros and exposures
nodes[].columns and sources
metadata and macros' path

正解: A

Listing downstreams is most directly done by recursively walking child_map[unique_id]. depends_on points upstream, and metadata/macros are not suited for downstream enumeration.

Frequently Asked Questions

What is the difference between manifest.json and run_results.json?

manifest.json holds static metadata (node attributes and dependencies). run_results.json holds execution results (success/failure, run time, row counts, etc.). They serve different purposes, so do not conflate them.

Does manifest.json contain credentials or data values?

No. manifest.json only contains structure and metadata, not credentials or actual data. You should still handle it securely, but it is not a primary vector for sensitive data leakage.

How should I handle differences between dbt versions?

Check metadata.dbt_version and build your tooling around stable fields such as resource_type, depends_on, config, and relation_name. If you depend on schema-volatile areas, add schema validation in CI so you catch breakage early.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Reading dbt manifest.json: Master the Static Metadata Behind Your Project