dbt

Reading dbt manifest.json: Master the Static Metadata Behind Your Project

2026-04-19
NicheeLab Editorial Team

manifest.json is the artifact that bundles together every piece of "static metadata" in a dbt project. Models, sources, tests, macros, and their dependencies all live in a single file. It is generated at target/manifest.json and refreshed by most commands that parse the project, including dbt compile, run, test, and docs generate.

This article focuses on the keys that stay stable across the official docs. We organize the inspection techniques you can use day-to-day alongside the angles the Analytics Engineer exam likes to test.

Where manifest.json fits and what it contains

manifest.json is the result of dbt's static analysis of your project. It contains the definitions of every node (models, seeds, snapshots, analyses, tests, etc.) along with sources, macros, and exposures, plus all of their dependencies. It captures the inputs to execution, not the execution results themselves.

Generation timing is predictable: any command that runs the dbt parser refreshes target/manifest.json. In CI you can persist this file as an artifact and use it for diff review and lineage health checks.

  • Location: target/manifest.json
  • Main top-level keys: nodes, sources, macros, exposures, parent_map, child_map, metadata
  • Static metadata: success/failure and row counts are NOT included (those live in run_results.json)
  • Stable keys you can rely on: resource_type, name, original_file_path, fqn, config, description, columns, depends_on

How dbt and manifest.json relate (static metadata → execution)

dbt projectmodels, etc.manifest.jsonStatic metadata: node attributes / dependenciesparent_mapDependency resolutionchild_mapLineage expansionrun_results.jsonExecution results: success/failure, time, row countsHow dbt and manifest.json relate (static metadata → execution)

manifest.json (excerpt — top-level shape)

{
  "metadata": {
    "dbt_version": "1.x.y",
    "generated_at": "2026-04-18T10:00:00Z",
    "project_id": "..."
  },
  "nodes": { "model.my_pkg.my_model": { "resource_type": "model", "name": "my_model", "config": { "materialized": "view" } } },
  "sources": { "source.my_pkg.my_src.my_tbl": { "source_name": "my_src", "name": "my_tbl" } },
  "macros": { "macro.my_pkg.some_macro": { "path": "macros/some.sql" } },
  "exposures": { },
  "parent_map": { "model.my_pkg.my_model": ["source.my_pkg.my_src.my_tbl"] },
  "child_map": { "source.my_pkg.my_src.my_tbl": ["model.my_pkg.my_model"] }
}

Top-level structure: what each key is for

The exam loves to test whether you can tell these top-level buckets apart. nodes holds most of the "things to build" (models, seeds, snapshots, analyses, tests). sources is dedicated to data sources. macros holds Jinja macros. exposures represents downstream consumers like BI dashboards. parent_map and child_map make it easy to walk the dependency graph forward or backward.

metadata carries context such as version and generation time, which you use to verify consistency between artifacts.

  • nodes: a mix of resource_types (model/test/seed/snapshot/analysis)
  • sources: source tables (database/schema/name; freshness data is in a separate artifact)
  • macros: definitions of callable macros (no execution results)
  • exposures: links between upstream dbt nodes and external surfaces
  • parent_map / child_map: bidirectional dependency lookups (handy for verifying selector behavior)
  • metadata: dbt_version, generated_at, project_id, etc.
KeyRoleTypical use case
nodesAll "things to build" nodes such as modelsMaterialization audits, column metadata inventory, dependency checks
sourcesExternal source definitionsLineage tracing origin, source freshness checks
macrosJinja macro definitionsCross-project reuse tracking, impact analysis
exposuresExternal consumers (BI tools, APIs, etc.)Showing dashboard upstream dependencies to stakeholders
parent_mapNode → parent nodesUpstream tracing (where did this come from)
child_mapNode → child nodesDownstream impact (where does this go)

List top-level keys for a quick existence check (jq example)

jq -r 'keys[]' target/manifest.json
# Example output:
# metadata
# nodes
# sources
# macros
# exposures
# parent_map
# child_map

Deep dive into nodes: model attributes and stable keys

Each entry in nodes is keyed by a unique_id (e.g. model.pkg_name.model_name). resource_type takes values like model, seed, snapshot, test, and analysis. Most real-world checks and exam questions can be solved by understanding config, depends_on, relation_name (the post-compile physical name), original_file_path, fqn, columns, tags, and description.

config.materialized is by far the most frequently checked field. The distinction between view/table/incremental/ephemeral has a direct impact on cost and dependency resolution. The descriptions under columns and the tests bound to them are the foundation of documentation quality.

  • resource_type: node kind (model, seed, snapshot, test, etc.)
  • name / alias / relation_name: logical name, alias, physical name
  • database / schema: logical schema info for the target connection
  • config: materialized, tags, partitioning, etc. (be careful with adapter-specific fields)
  • depends_on.nodes: array of direct dependency nodes
  • original_file_path / package_name / path: provenance tracing

Model node excerpt (important fields)

{
  "nodes": {
    "model.jaffle_shop.orders": {
      "resource_type": "model",
      "name": "orders",
      "package_name": "jaffle_shop",
      "original_file_path": "models/orders.sql",
      "fqn": ["jaffle_shop", "models", "orders"],
      "database": "ANALYTICS",
      "schema": "DBT_DEV",
      "alias": null,
      "relation_name": "ANALYTICS.DBT_DEV.ORDERS",
      "config": { "materialized": "view", "tags": ["core"] },
      "description": "Base orders fact model",
      "columns": {
        "order_id": {"name": "order_id", "description": "Primary key"},
        "customer_id": {"name": "customer_id", "description": "Foreign key"}
      },
      "depends_on": { "nodes": ["source.jaffle_shop.raw.orders"] },
      "checksum": {"name": "sha256", "checksum": "..."}
    }
  }
}

Dependency graphs in practice: depends_on / parent_map / child_map

Lineage starts with depends_on.nodes for understanding immediate upstreams. parent_map and child_map are best when you need to walk the full graph. Recursive upstream expansion is done by traversing parent_map; downstream impact analysis by traversing child_map.

Selector behavior follows the official spec, but cross-checking against manifest.json removes a lot of guesswork. For example, the +model_name expansion can be reproduced by recursively walking child_map.

  • Upstream expansion: recursively walk parent_map[target]
  • Downstream expansion: recursively walk child_map[target]
  • Boundary handling: source.* is terminal (normally no further upstream)
  • test nodes: hang off the target model, but execution order is managed separately
  • Stable graph key: treat unique_id (e.g. model.pkg.name) as canonical

Quick jq to list upstreams and downstreams (starting from a unique_id)

# Upstream (parents)
uid="model.jaffle_shop.orders"
jq --arg uid "$uid" -r '.parent_map[$uid][]?' target/manifest.json

# Downstream (children)
jq --arg uid "$uid" -r '.child_map[$uid][]?' target/manifest.json

# Direct dependencies (inside the node)
jq --arg uid "$uid" -r '.nodes[$uid].depends_on.nodes[]?' target/manifest.json

How to read sources, tests, and exposures

sources is where dbt names and documents upstream database tables and views. In manifest.json they appear under unique_ids of the form source.<pkg>.<source_name>.<name>, exposing database/schema/name, loader, description, and so on.

Most tests are stored under nodes with resource_type: test, and references to the target node or column live in their depends_on. exposures carries the dashboard's owner, url, maturity, depends_on, and other static info, making it easy to communicate upstream dependencies to stakeholders.

  • sources: naming and documentation; freshness must be combined with a separate artifact
  • tests: unique_id contains test.*; the target model/column is identified via depends_on
  • exposures: enumeration of upstream dbt nodes plus metadata (owner, type, url, maturity)

Excerpts for source, test, and exposure

{
  "sources": {
    "source.jaffle_shop.raw.orders": {
      "source_name": "raw",
      "name": "orders",
      "database": "RAW",
      "schema": "JAFFLE",
      "loader": "ingestion_tool",
      "description": "Raw orders table"
    }
  },
  "nodes": {
    "test.jaffle_shop.not_null_orders_order_id": {
      "resource_type": "test",
      "name": "not_null",
      "depends_on": {"nodes": ["model.jaffle_shop.orders"]}
    }
  },
  "exposures": {
    "exposure.jaffle_shop.orders_dashboard": {
      "type": "dashboard",
      "name": "orders_dashboard",
      "owner": {"name": "BI Team", "email": "[email protected]"},
      "url": "https://bi.example.com/dash/123",
      "maturity": "high",
      "depends_on": ["model.jaffle_shop.orders"]
    }
  }
}

Operational checklist and exam tips

In production, it pays to collect manifest.json on a schedule and automate checks for materialization drift, undocumented columns, and unused models (zero downstreams). In CI, attaching the manifest diff to pull requests significantly speeds up review.

On the exam, expect recurring questions about: the distinction that manifest.json is "static metadata" while execution results live in run_results.json; what each of nodes/sources/macros/exposures owns; the roles of depends_on vs parent_map/child_map; and the impact of config.materialized. Avoid memorizing version-sensitive fields and lean on stable keys like resource_type, depends_on, config, and relation_name.

  • Bulk inspect materialized: catch missing view conversions and runaway table creation
  • Models with zero downstreams: cleanup candidates (child_map[id] is empty)
  • Undocumented columns: track the share where columns[].description is empty
  • Execution results live in run_results.json; source freshness numbers come from a separate artifact
  • Manage version differences explicitly via metadata.dbt_version

Inventory materialized for every model (jq)

jq -r '
  .nodes
  | to_entries[]
  | select(.value.resource_type=="model")
  | [.key, (.value.config.materialized // "<unset>")]
  | @tsv
' target/manifest.json
# Output: pairs of unique_id and materialized

Check your understanding

Analytics Engineer

問題 1

As an Analytics Engineer using manifest.json, you want to list every downstream node of a given model. Which combination of fields should you reference most directly?

  1. child_map and the target model's unique_id
  2. nodes[].depends_on.macros and exposures
  3. nodes[].columns and sources
  4. metadata and macros' path

正解: A

Listing downstreams is most directly done by recursively walking child_map[unique_id]. depends_on points upstream, and metadata/macros are not suited for downstream enumeration.

Frequently Asked Questions

What is the difference between manifest.json and run_results.json?

manifest.json holds static metadata (node attributes and dependencies). run_results.json holds execution results (success/failure, run time, row counts, etc.). They serve different purposes, so do not conflate them.

Does manifest.json contain credentials or data values?

No. manifest.json only contains structure and metadata, not credentials or actual data. You should still handle it securely, but it is not a primary vector for sensitive data leakage.

How should I handle differences between dbt versions?

Check metadata.dbt_version and build your tooling around stable fields such as resource_type, depends_on, config, and relation_name. If you depend on schema-volatile areas, add schema validation in CI so you catch breakage early.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
dbt

dbt Models: SQL-Defined Transformation Units (2026)

Model fundamentals — SELECT-based definitions, naming, refs,...

dbt

dbt Analytics Engineering Exam: Complete Guide (2026)

Pass the AE Certification — scope, weighting, sample questio...

dbt

dbt Cloud vs dbt Core: Feature & Cost Comparison (2026)

Honest comparison of dbt Cloud vs. dbt Core — IDE, scheduler...

dbt

dbt Project Structure: models/seeds/macros Layout (2026)

Recommended dbt project layout — models, seeds, macros, snap...

dbt

dbt_project.yml Explained: Every Config (2026)

Every dbt_project.yml setting that matters — paths, vars, ma...

Browse all dbt articles (101)
© 2026 NicheeLab All rights reserved.