Vault Performance Replication: Read-Local (2026)

If Vault throughput is hitting a ceiling per region, or cross-region latency has become a bottleneck, Performance Replication is the strongest option to consider.

This article covers everything from design decisions to setup, operations, monitoring, and troubleshooting — equally useful for exam preparation (aligned with HashiCorp Security Automation exam scope) and real-world practice.

Goals and Prerequisites of Performance Replication

Performance Replication is a scale-out feature in Vault Enterprise that processes reads locally on secondaries in each region while applying writes centrally on the primary. This shortens latency and stabilizes throughput.

There are three key prerequisites: 1) Replication is asynchronous, so very short replication lag can occur. 2) Authentication and tokens on a secondary are local to each cluster — there is no global session sharing. 3) Failover during incidents is handled by DR Replication, while Performance Replication mainly targets scale and latency optimization — these roles are intentionally separated.

For storage, both Consul and Integrated Storage (Raft) are available, but Integrated Storage is increasingly recommended for operational simplicity and unified management.

Reads are processed locally on each Performance secondary
Writes are applied to the primary as a rule (write-forwarding from secondaries is supported)
Authentication, tokens, and leases are independent per cluster
Disaster recovery is the role of DR Replication (separate the use cases)

Topology and Data Flow (Read Distribution and Write Consistency)

A typical layout is one Performance primary plus multiple Performance secondaries per region. Each cluster is internally HA, and external access goes through a per-region load balancer.

Client reads land on the nearest secondary, and writes are forwarded via the secondary to the primary, returning after the primary commits. Data is then replicated asynchronously to each secondary. Because of the short replication lag, strict local read-after-write consistency is not guaranteed (in most cases this is practically fine, but stricter requirements call for mitigation in your design).

For networking, replication between clusters must reach Vault's cluster address (TCP 8201/TLS by default). API access (8200/TLS) is exposed to clients and load balancers.

Each cluster is configured for HA (Active + Standby)
LB health checks use /v1/sys/health (apply performance-standby-ok=true to also allow standbys)
Inter-cluster traffic assumes mutual TLS (distribute and verify CAs thoroughly)

Conceptual diagram of global Performance Replication

Mode Comparison and Design Decisions (Performance / DR / Single Cluster)

Balancing scale, availability, and operational cost starts with correctly understanding each replication mode's role. Remembering that Performance is for latency reduction and read throughput, while DR is for disaster recovery, will keep you on track in the exam.

The comparison table below is useful during early requirement gathering.

If you have strict RTO/RPO requirements, combine with DR Replication
Global read optimization is centered on Performance Replication
Tokens and leases are independent per cluster (not shared globally)

Perspective	Performance Replication	DR Replication	Single Cluster
Primary purpose	Read distribution, latency reduction, horizontal scale	Disaster recovery and primary replacement	Simple operations
Reads	Processed locally on each secondary	DR side is normally idle (activated upon failover)	Single site only
Writes	Applied centrally on the primary (forwarding from secondaries supported)	Not allowed on DR side (allowed after promotion)	Handled at a single site
Consistency	Asynchronous replication (short lag)	Not in effect until promotion	Local consistency only
Failover	Out of scope; assumes DR is also used	Explicitly supported (via promotion)	Vulnerable to server failures
Auth / tokens	Independent per cluster	After promotion, valid on the new primary	Single management plane

Minimal Setup Procedure (CLI-Focused)

We assume both primary and secondary are already running Vault Enterprise, initialized, unsealed, and TLS-configured. The example uses Integrated Storage (Raft) for storage.

Secondary registration uses a one-time activation token, after which a full sync runs in the background.

Enable Performance Replication on the primary
Issue a token for the secondary (managed by identifier)
Enable and join from the secondary
Verify sync and status
Update LB health checks (decide whether to allow standbys)

CLI example: from enabling to status check

# 1) Enable Performance Replication on the primary
vault write -f sys/replication/performance/primary/enable

# 2) Generate a one-time token for the secondary to join (with an identifier)
vault write sys/replication/performance/primary/secondary-token id="sec-eu-001"
# Note the token field from the output (single-use)

# 3) Enable Performance Replication on the secondary (using the primary-issued token)
vault write sys/replication/performance/secondary/enable token="s.xxxxx..."

# 4) Check status (verifying on both sides is reassuring)
vault read sys/replication/status
vault read sys/replication/performance/status

# 5) (Optional) Take an Integrated Storage snapshot (for offload / verification)
#   Operationally, regular snapshots and secure storage are recommended
vault operator raft snapshot save /tmp/vault_raft.snap

Operational Essentials (LB, Auth, Write Forwarding, Backups)

LB design: each region's LB prefers the active node and, when needed, uses a health check with performance-standby-ok=true to also permit standbys. The standard pattern for redirecting to another region during a cluster failure is to combine Performance with DR, not to rely on Performance alone.

Authentication and tokens: operated independently per cluster. Users and apps log in at the nearest secondary, and the resulting token is only valid within that cluster. Avoid designs that assume a globally common token.

Write forwarding: write requests to a secondary are forwarded to the primary and respond after the primary commits. Even after a successful response, there can be slight delay before the change appears on the secondary — bake this into your API client retries and consistency requirements.

LB health check: /v1/sys/health?performance-standby-ok=true
Ports: API 8200/TLS, Cluster 8201/TLS (mutual connectivity)
Design tokens/leases to be scoped within each cluster
Backups are still required separately (regular snapshots)

Health check example (curl) and snapshot

# LB health check (allow standbys)
curl -sS "https://vault-eu.example.com:8200/v1/sys/health?standbycode=200&performance-standby-ok=true" -o /dev/null -w "%{http_code}\n"

# Save a Raft snapshot (requires authentication)
vault login s.xxxxx...
vault operator raft snapshot save /backups/vault-`date +%F`.snap

Exam Points and Troubleshooting

Exam tips: Performance is for scale, DR is for failover. Tokens/leases are per cluster. Secondary writes are forwarded and applied at the primary. Lock in these three points first.

Issues like "initial sync isn't progressing" or "lag won't go away" are most often caused by networking, TLS, or time sync. Confirm basic connectivity, certificate chains, and NTP before digging deeper.

Inspection command: vault read sys/replication/status
Connectivity: does 8201/TLS reach bidirectionally between clusters?
Certificates: does the SAN include the cluster address?
Time sync: NTP drift affects TLS validation and TTLs
Replication lag: monitor the correlation between load (WAL/queue) and bandwidth

Checking basic status and health

# Replication state
vault read sys/replication/status
vault read sys/replication/performance/status

# Health checks (on primary and secondary)
curl -sS https://vault-primary.example.com:8200/v1/sys/health | jq .
curl -sS "https://vault-eu.example.com:8200/v1/sys/health?performance-standby-ok=true" | jq .

Check with a Sample Question

Ops

問題 1

Which statement about Vault Enterprise Performance Replication is most accurate?

A token acquired on a secondary is valid on all clusters, enabling global authentication
Writes to a secondary are forwarded to the primary and respond after being committed on the primary
Performance Replication is primarily for disaster recovery (automatic failover)
Reads are always handled by the primary, and secondaries only stand by

正解: B

Performance Replication processes reads locally on each secondary and forwards writes from a secondary to the primary, where they are applied. Tokens/leases are independent per cluster, and DR scenarios are the responsibility of DR Replication.

Frequently Asked Questions

Can I send write requests to a secondary?

Yes. The secondary forwards writes to the primary, and the response returns after the primary commits. Be aware that an immediate local read on the secondary may see a very short replication lag.

Are tokens and leases replicated?

No. With Performance Replication, authentication, tokens, and leases are local to each cluster. Users and applications log in to the nearest cluster, and the issued token is only valid within that same cluster.

Can Performance Replication alone handle failover?

Not recommended. Business continuity should be designed with DR Replication. Performance Replication is primarily for scale and latency optimization, and separating the two roles is the best practice.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Vault Performance Replication: Practical Guide to Load Distribution and Horizontal Scaling