Vault Performance Standby Nodes: Read Scale-Out (2026)

In production Vault deployments, reads typically dominate the workload. This is exactly where Enterprise's Performance Standby shines: standby nodes handle read APIs themselves, dramatically reducing the load on the active node.

This article walks through production-grade design points and the comparison, health-check, and forwarding gotchas that frequently appear on Ops-oriented certification exams.

Foundations: HA and Where Performance Standby Fits

In Vault HA, one node is active and the rest are standby. Standby nodes normally just forward requests to the active node, but in Enterprise the Performance Standby feature lets standby nodes serve most reads themselves.

Secret lookups and token reads/updates (updates that do not touch storage) get distributed, which prevents the active node from becoming a CPU/IO hot spot. Writes — secret creation/updates, mount enable/disable, policy changes, and so on — continue to be processed centrally on the active node.

Scope: read-scale within a single Vault Enterprise cluster
Prerequisites: HA-capable storage (Consul or Raft / Integrated Storage) with correct cluster_addr settings
Goal: serve reads locally, reduce active-node load and latency, improve throughput

Request Flow and Consistency Model

Performance Standby serves eligible read endpoints on the local node. Ineligible requests, or those that cannot be completed locally for consistency reasons, are forwarded to the active node via request forwarding. With cluster_addr and TLS configured correctly, clients can simply target a single VIP without worrying about redirects.

Consistency is maintained by sharing the same storage and invalidating / synchronizing as needed. Depending on timing — for example a read immediately after a write — Vault may safely forward the request to the active node. This behavior prioritizes a consistent view of the data.

Typical eligible reads: secret reads/lists, certain token/lease lookups and updates that do not modify storage
Typical ineligible operations: system config changes, policy/engine enable or rotation, explicit writes
Clients only need to hit a single LB endpoint — the standby internally decides whether to serve locally or forward

Performance Standby request flow (single cluster)

Picking the Right Feature: Standard Standby / Performance Standby / Performance Replication

There are a lot of similar-sounding terms, so let's clarify how to choose from an Ops perspective. For read-scale inside a single cluster, use Performance Standby. To optimize cross-region latency and throughput across multiple clusters, use Performance Replication Secondary (a separate cluster). Standard standbys are basically just forwarders.

DR Replication exists for recovery purposes and is not intended for routine read distribution. The focus of this article is read-scale within a cluster (Performance Standby).

Read-scale in a single cluster → Performance Standby
Regional distribution / near-side reads → Performance Replication Secondary
Simple HA / small scale → Standard Standby is fine (reads mostly forwarded)

Aspect	Standard Standby	Performance Standby (Ent)	Performance Replication Secondary (Ent)
Scope	Single cluster (standby)	Single cluster (standby + local reads)	Separate cluster (per region/site)
Read handling	Mostly forwarded to active	Eligible reads served by standby	Served on secondary (replicated data)
Write handling	Performed on active	Performed on active (forwarded)	Performed on primary (replicated)
Latency / bandwidth goal	Reduce redirects / forwards	Reduce active-node load and latency	Faster reads via geographic proximity
Primary use case	Small to medium HA	Single cluster with very read-heavy traffic	Multi-region read optimization

Setup and Configuration: cluster_addr, LB, and Health Checks

The most important thing is to get the foundation for request forwarding right. Align cluster_addr and TLS on every node so that mutual reachability and certificate validation work between nodes. Without this, standbys cannot forward, and clients get dragged into redirects.

Operationally, the easiest design is a single VIP fronting all nodes (active + standby). The standard health check is /sys/health with standbyok=true and perfstandbyok=true, so both standby and performance-standby nodes are treated as healthy. Writes that land on a standby are forwarded internally, so client-side path splitting is not required.

Set cluster_addr and api_addr correctly on every node (consistent TLS, including SANs)
LB health check example: /v1/sys/health?standbyok=true&perfstandbyok=true
Performance Standby is enabled by default in Enterprise. Disable it explicitly in the server config only if you need to

Example: minimal config using Integrated Storage (excerpt)

# /etc/vault.d/server.hcl
api_addr    = "https://vault.example.com:8200"
cluster_addr = "https://10.0.0.1:8201"
disable_mlock = true

storage "raft" {
  path    = "/opt/vault/data"
  node_id = "vault-1"
}

listener "tcp" {
  address       = "0.0.0.0:8200"
  tls_cert_file = "/etc/vault/tls/server.crt"
  tls_key_file  = "/etc/vault/tls/server.key"
}

# EnterpriseではPerformance Standbyは既定で有効
# 無効化したい場合のみ:
# disable_performance_standby = true

Operations: Monitoring, Testing, and Capacity Planning

For monitoring, track node role (active vs. performance standby) and the ratio of locally-served vs. forwarded requests. Capturing per-endpoint response-time distributions and forward ratios via telemetry (e.g. Prometheus) makes bottleneck analysis and effect measurement much easier.

Run load tests at realistic ratios such as 9 reads to 1 write, and measure latency / throughput through the actual LB-wired topology. For active-node failover tests, also measure the transient latency spike during warm-up (cache invalidation and re-convergence) after a standby is promoted.

Observability: local-vs-forwarded ratio, p95/p99 latency, error rate, sys/health state transitions
Testing: through the real LB, with read-heavy workloads; measure the impact of active-node switchover
Capacity: size node count and vCPU/memory for read-heavy traffic; validate incrementally via scale-out

Exam Prep Checklist (Ops)

Remember Performance Standby with a single phrase: 'read-scale within a single cluster.' Knowing where it sits relative to adjacent features and the basics of LB, health checks, and forwarding translates directly into exam points.

Performance Standby: most reads served by standbys; writes are handled centrally on the active node
Request forwarding requires cluster_addr and correct TLS
LB health checks should use /sys/health?standbyok=true&perfstandbyok=true
Standard Standby is basically a forwarder; Performance Replication is for read optimization across separate clusters
Note that the feature is not for DR (DR Replication serves a different purpose)
If the active node fails, a standby is promoted; clients continue transparently just by hitting the VIP

Check Your Understanding

Ops

問題 1

In Vault Enterprise, read traffic has surged and the active node's CPU is maxed out. You want to raise read throughput with the smallest possible architectural change. Which option is most appropriate?

Register all nodes (active + standby) under the LB's VIP and enable the /sys/health check with standbyok=true&perfstandbyok=true
Enable DR Replication and have the secondary serve reads
Add more ACL policies and queue read requests so they take turns
Point all clients directly at the active node's IP

正解: A

Read-scale within a single cluster is the domain of Performance Standby. Distributing across all nodes via the LB and treating standbys as healthy lets eligible reads be served locally on each standby. DR Replication is for recovery, not read distribution. The other options either do not help throughput or just lock in the bottleneck.

Frequently Asked Questions

Is Performance Standby available in Vault OSS?

No. Performance Standby is a Vault Enterprise feature. OSS still supports standby request forwarding, but standby nodes cannot serve reads locally.

Which storage backends support it?

HA-capable storage (Consul or Raft / Integrated Storage). In either case, correct cluster_addr settings and TLS reachability between nodes are critical.

What if I temporarily want everything handled by the active node?

As a stop-gap, enable disable_performance_standby in the server config or switch the LB to target only the active node. For steady-state operation, re-enable Performance Standby afterwards.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Scaling Vault Reads with Performance Standby (Ops Practice and Exam Prep)