dbt run_results.json: Execution Telemetry (2026)

dbt generates a run_results.json on every invocation, recording each node's status, execution time, and adapter response. It is not just a log — it is a reliable data source for quantifying pipeline performance and quality.

This article walks through everything from understanding the structure of run_results.json to ingesting it into a data platform and monitoring SLOs, all grounded in the stable spec from the official documentation. Analytics Engineer exam angles are woven in throughout.

run_results.json: Structure and Role

run_results.json is the artifact that summarizes a dbt run. In dbt Core it is typically written to the target directory, and in dbt Cloud it is available as an artifact for each job run. Each entry represents the result for a single executed node, and the top-level metadata includes the dbt version, generation timestamp, invocation_id, and more.

Representative fields look like this: metadata (dbt_schema_version, dbt_version, generated_at, invocation_id, etc.); results (an array with unique_id, status, execution_time, timing, adapter_response, message, thread_id, etc.); and elapsed_time (the total elapsed time of the run). The content of adapter_response is adapter-dependent, but values such as rows_affected may be included.

Location: dbt Core writes to target/run_results.json; dbt Cloud exposes it via the Artifact API for each run
Per-node status values: success, error, skipped (tests may also produce fail)
invocation_id uniquely identifies a single run
execution_time is per-node seconds; elapsed_time is the seconds for the entire run

Artifact	Main contents	Update timing
run_results.json	Each node's result, duration, and adapter response	After each command run (run, test, seed, etc.)
manifest.json	Project dependencies, node definitions, resolved macros	At compile / run time
catalog.json	Object column profile (descriptions, types, etc.)	When the catalog command is run

Artifact generation flow (overview)

dbt CLI/Cloud
   |
   v
[Compile & Execute]
   |
   +--> target/manifest.json
   |
   +--> target/run_results.json  <-- run results
   |
   +--> target/logs/ ...

Minimal example of run_results.json (excerpt)

{
  "metadata": {
    "dbt_version": "1.x.x",
    "generated_at": "2024-04-01T12:34:56Z",
    "invocation_id": "e1c2..."
  },
  "results": [
    {
      "unique_id": "model.my_project.dim_customers",
      "status": "success",
      "execution_time": 12.34,
      "adapter_response": {"rows_affected": 1234},
      "timing": [{"name": "execute", "started_at": "...", "completed_at": "..."}],
      "message": null
    }
  ],
  "elapsed_time": 45.67
}

Reading the Execution Metrics: duration, status, rows_affected

results[].execution_time is the pure execution seconds for a node and is highly effective for identifying bottlenecks. status takes success, error, skipped (and fail for tests), letting you tell apart retries and selector-driven effects. adapter_response is adapter-dependent — Snowflake and Databricks, for example, may return rows_affected and execution codes.

In practice you group records by invocation_id to treat one run as a unit, then visualize per-node P95 execution time, success rate, and row-count change (rows_affected). Use that to surface nodes that are consistently slow or whose success rate is unstable, and prioritize your improvements accordingly.

Latency detection: continuously monitor P95/P99 of execution_time
Success rate: compute the share of success statuses per invocation_id
Volume signal: use rows_affected as an approximate volume metric (mind the adapter dependency)
Skip semantics: skipped reflects selector conditions and cache effects

Status	Meaning	Treatment in monitoring
success	Node finished normally	Counted in both numerator and denominator of success rate
error	Failed with a runtime error	Critical incident; alert required
skipped	Not run due to selector or dependencies	Typically excluded from the denominator (adjust by policy)

Execution timeline of a single node (conceptual)

[compile]----[execute================]----[post]
            ^ start                 ^ end
execution_time = end - start

Python: extract the top 5 slowest nodes (parsing run_results.json directly)

import json, pathlib
p = pathlib.Path('target/run_results.json')
with p.open() as f:
    data = json.load(f)
rows = []
for r in data.get('results', []):
    rows.append({
        'unique_id': r.get('unique_id'),
        'status': r.get('status'),
        'sec': float(r.get('execution_time', 0.0)),
        'rows_affected': (r.get('adapter_response') or {}).get('rows_affected')
    })
slow = sorted([x for x in rows if x['status']=='success'], key=lambda x: x['sec'], reverse=True)[:5]
for i, x in enumerate(slow, 1):
    print(f"{i}. {x['unique_id']}  {x['sec']:.2f}s  rows_affected={x['rows_affected']}")

Quality Monitoring: Handling and Visualizing Test Results

dbt test results are also recorded in run_results.json. Test unique_ids are generally in the form test.<package>...., and the status uses success / fail / error. fail indicates an assertion violation, while error indicates a runtime failure.

Counting failures precisely depends on the test type and the adapter implementation. run_results.json is sufficient for pass/fail per test, but details such as violation row counts are best confirmed by combining the test SQL and logs.

Test identification: extract entries whose unique_id contains test.
Failure type: separate fail (assertion violation) from error (execution failure)
Stable operations: isolate tests with unstable external dependencies (time functions, non-determinism) into separate monitoring

Test type	Execution unit	Identification in run_results
Generic tests	Assertions expanded at the model/column level	unique_id in the form test.<pkg>.<name>.<hash>
Singular tests	Assertions written as arbitrary SQL	unique_id in the form test.<pkg>.<file_name>...

Relationship between a model and its tests (conceptual)

model.my_project.dim_customers
  |
  +-- test.my_project.not_null_dim_customers_id (success)
  +-- test.my_project.unique_dim_customers_id    (fail)

Python: compute the test pass rate

import json
with open('target/run_results.json') as f:
    data = json.load(f)
results = [r for r in data.get('results', []) if 'test.' in (r.get('unique_id') or '')]
passed = sum(1 for r in results if r.get('status')=='success')
failed = sum(1 for r in results if r.get('status') in ('fail','error'))
rate = (passed / (passed + failed)) * 100 if (passed+failed)>0 else None
print({'tests': len(results), 'passed': passed, 'failed': failed, 'pass_rate_pct': rate})

Storage and Query Design: Ingest, Normalize, and Analyze in the Warehouse

Because run_results.json is one file per invocation, store it in object storage and then ingest it into the warehouse keyed by invocation_id and generated_at. A solid approach is to keep the full JSON in a single column at the RAW layer first, and then FLATTEN/EXPLODE the results array to build wide analysis tables.

For the primary key, the combination of invocation_id and unique_id is convenient in practice and protects against double-loading. Because row counts and types are adapter-dependent, design numeric columns to be nullable when you cast them.

Ingestion unit: 1 run = 1 JSON (identified by invocation_id)
Primary key: (invocation_id, unique_id)
Normalization: expand results and break out status, execution_time, rows_affected as columns
Partitioning: partition by generated_at (timestamp)

Layer	Purpose	Main columns / format
RAW (JSON)	Preserve original data; reprocessable	artifact VARIANT/STRING, load_ts
STG (expanded)	Row-expand the results array	invocation_id, unique_id, status, execution_time, adapter_response
DM (analysis)	Dashboards and SLO metrics	Per-node P95, success rate, rows_affected, etc.

Ingestion pipeline (Snowflake example)

[Object Storage]
      |
  COPY INTO (RAW VARIANT)
      |
   FLATTEN(results)
      |
  ANALYTICS TABLES (SLO / visualization)

Snowflake: JSON expansion and normalization (conceptual example)

create or replace table raw_run_results(artifact variant);
-- Ingest with COPY INTO from a stage into artifact (omitted)

-- Expand (one row per entry in the results array)
create or replace view stg_run_results as
select
  artifact:metadata:invocation_id::string as invocation_id,
  artifact:metadata:generated_at::timestamp_tz as generated_at,
  r.value:unique_id::string as unique_id,
  r.value:status::string as status,
  r.value:execution_time::float as execution_time,
  r.value:adapter_response as adapter_response
from raw_run_results,
     lateral flatten(input => artifact:results) r;

-- Analytics table (example: extract rows_affected, nullable)
create or replace view dm_run_results as
select
  invocation_id,
  unique_id,
  status,
  execution_time,
  try_to_number(adapter_response:rows_affected) as rows_affected,
  generated_at
from stg_run_results;

Observability and SLO Design: Success Rate, P95 Execution Time, Regression Detection

The foundation of your SLOs is the per-invocation_id success rate and the per-node P95 execution time. Because run_results.json reliably tells you how long each node took, it lets you quantify the health of scheduled runs. For workloads with seasonality, managing P95 separately by day of week and time of day helps you avoid false positives.

For regression detection, monitor moving medians and P95 over the last N runs, alerting when thresholds are exceeded. Display tests with frequent failures on a separate dashboard from the model success rate so you can cleanly distinguish root causes — SQL errors vs. data-quality failures.

Candidate SLOs: run success rate >= 99.x%; P95 execution time of key models <= target
Alerting: alert immediately on error; alert on P95 breaches sustained for n consecutive runs
Handling seasonality: maintain separate baselines by day of week and time of day

Metric	Example definition	Caveats
Success rate	success / (success + error)	skipped is usually excluded from the denominator
P95 execution time	95th percentile of execution_time	Measure DB execution time, not scheduler latency
Row count change	Trend of rows_affected	Adapter-dependent; tolerate NULLs

SLO monitoring flow (conceptual)

run_results.json -> RAW -> STG -> DM
        |                 |
        v                 v
   Success rate        P95 execution time
        \______________/
                |
             Alerting

Databricks (PySpark): compute P95 execution time

from pyspark.sql import functions as F
from pyspark.sql import types as T

# Load JSON (adjust file placement to your environment)
df_raw = spark.read.json('/mnt/artifacts/run_results/*.json')

# Explode results
results = df_raw.select(
    F.col('metadata.invocation_id').alias('invocation_id'),
    F.explode_outer('results').alias('r')
)

fact = results.select(
    'invocation_id',
    F.col('r.unique_id').alias('unique_id'),
    F.col('r.status').alias('status'),
    F.col('r.execution_time').cast('double').alias('execution_time')
)

p95 = (fact
  .where(F.col('status')=='success')
  .groupBy('unique_id')
  .agg(F.expr('percentile_approx(execution_time, 0.95)').alias('p95_sec')))

p95.orderBy(F.desc('p95_sec')).show(20, truncate=False)

Exam Hot Spots and Pitfalls

The Analytics Engineer exam tends to test your understanding of how the different artifacts differ in role and which metrics to use for performance diagnosis. To identify slow models the correct answer is to use execution_time from run_results.json — manifest.json is primarily for understanding structure and dependencies.

A common pitfall: ephemeral models are not executed standalone — they are inlined into parent models, so they do not appear as independent results in run_results.json. seed entries only appear when dbt seed is run. And remember that rows_affected is adapter-dependent and is not always populated.

Use execution_time (run_results.json) to identify slow models
Use manifest.json to visualize dependencies or evaluate selectors
ephemeral does not appear as an independent result
seed results are stored in run_results.json when dbt seed is run

Common misconception	Correct understanding	How to address it
You can see execution time in manifest.json	Execution time is recorded in run_results.json	Reference run_results.json for latency analysis
ephemeral also produces a result row every time	It is inlined into the parent and has no independent result	Watch for source SQL bloat and monitor the parent model instead
rows_affected is always available	It is adapter-dependent and can be NULL	Design schemas and visuals to be NULL-safe

Node types and how they appear in run_results

model (materialized)  -> appears
seed                  -> appears when seed is run
snapshot              -> appears
test                  -> appears (success/fail/error)
ephemeral             -> does not appear independently

Bash: save and ship run_results.json after a run (conceptual)

set -euo pipefail
run_ts=$(date -u +%Y%m%dT%H%M%SZ)
dbt run --select state:modified || true
# Save target/run_results.json
mkdir -p artifacts/$run_ts
cp target/run_results.json artifacts/$run_ts/
# Example: upload to AWS S3 (adjust to your environment)
# aws s3 cp artifacts/$run_ts/run_results.json s3://my-bucket/dbt/run_results/$run_ts.json

Check Your Knowledge

Analytics Engineer

問題 1

You want to identify the model that took the longest during a dbt run. Which data source and field combination is most appropriate?

Compare execution_time in results[] of run_results.json per unique_id
Compare the length of nodes[].compiled_sql in manifest.json
Grep the log files and extract the longest line
Pick the model with the most columns in catalog.json

正解: A

Execution time is recorded as execution_time in each result entry of run_results.json. manifest.json and catalog.json describe structure and metadata, so they are not appropriate for comparing execution time.

Frequently Asked Questions

Where can I find run_results.json?

In dbt Core it is generated at target/run_results.json under your project. In dbt Cloud you can download it from the Artifacts of each job run.

Are ephemeral models recorded in run_results.json?

Ephemeral models are not executed independently; they are inlined into their parent models, so they do not appear as independent result entries.

Is rows_affected always available?

No. The content of adapter_response is adapter-dependent. Many adapters populate rows_affected, but it can be NULL, so handle it as nullable when you analyze the data.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Putting dbt run_results.json to Work: Logs, Metrics, and Exam-Ready Analysis