Vault Replication: Performance & DR (Enterprise) (2026)

Vault Enterprise Replication comes in two flavors: DR (Disaster Recovery) and Performance. DR exists for availability, while Performance is about cutting latency and scaling reads. Their roles are distinct, and running both together is the default approach.

This article distills the design and operational decisions you actually need, based on stable behavior documented in the official guides. We cover the Ops-side procedures, monitoring, failover steps, and the exam points you are most likely to be tested on.

Enterprise Replication: Big Picture and Terminology

Replication is an Enterprise feature that asynchronously replicates cluster state from one cluster to another. DR Replication is a standby system built for failover, while Performance Replication targets read scale-out and geographic distribution. In most enterprise environments, a single Primary is paired with multiple Performance Secondaries and one or more DR Secondaries.

Storage backends are Integrated Storage (Raft) or Consul, but Integrated Storage is recommended for new builds. Either way, inter-cluster traffic is protected with TLS, and the connection is established using a one-time activation token issued when the secondary is enabled.

Primary: the source of replication (the authoritative writer)
Secondary: the replication target. DR Secondaries are standby; Performance Secondaries are read-only against replicated data.
Local mount: a mount excluded from replication, scoped locally (frequently used on Performance Secondaries).
Namespace: the Enterprise multi-tenancy unit. Replication operates with Namespaces in scope.

Aspect	DR Replication	Performance Replication
Primary purpose	Disaster recovery (failover)	Low-latency reads and scale-out
Client requests	Generally not accepted (standby)	Reads allowed (replicated data is read-only)
Writes	Not allowed (allowed after promotion)	Not allowed on replicated mounts. Allowed on Local mounts.
Scope	Entire cluster (comprehensive)	Selective (Local mounts can be excluded)
Failover	Promote a DR Secondary to make it the new Primary	Out of scope (availability is handled by DR)
RPO/RTO model	Asynchronous. RPO is seconds to tens of seconds; RTO depends on promotion time.	Asynchronous propagation. RPO/RTO are unrelated to service continuity (it is for read performance).

DR Replication: Design and Behavior

A DR Secondary normally does not handle client requests. It asynchronously receives the Primary's entire state — policies, secrets, tokens, leases, everything — so it can be promoted quickly during an incident. Once promoted, it becomes the new Primary and accepts client reads and writes.

DR is designed for whole-cluster protection — it is not a selective exclusion mechanism. RPO depends on network conditions and load, and you should plan for seconds to tens of seconds of asynchronous lag.

Normal operation: the DR Secondary stays on standby and generally responds only to health-related API calls.
During an incident: promote the DR Secondary and resume operations.
Planned maintenance: cut over to DR, then resync and fail back once the original site is recovered.

A typical topology (Primary + Performance Secondaries + DR Secondary)

Performance Replication: Design and Behavior

A Performance Secondary serves client reads from a nearby region to cut latency. Replicated mounts and data are read-only by default — write transactions must be funneled to the Primary.

Data you do not want replicated should be isolated on mounts created with the local flag. Region-specific short-lived data and metric tokens, for example, can live on Local mounts so each secondary operates independently.

Replicated data is read-only (writes go to the Primary).
Local mounts are not replicated; each secondary can read and write them independently.
Watch consistency at the Namespace level — design assumes identical logical structure.

Setup Steps and Key Commands (Within Stable Behavior)

Below is a minimal enable-flow example. It assumes TLS is correctly configured and the clusters can reach each other. In production, combine this with RBAC, network policies, and audit configuration.

The commands use stable APIs from the official documentation. Validate the flow in a staging environment before rolling it out to production.

DR and Performance can both run on the same Primary.
Secondary activation is bound via a one-time token.
Local mounts are created by passing -local to secrets enable.

Enabling DR/Performance and promotion (example)

# 1) DR Primary を有効化（Primary クラスターにて）
vault write -f sys/replication/dr/primary/enable

# 2) DR Secondary 用のアクティベーション・トークンを発行（Primary）
DR_TOKEN=$(vault write -field=token -f sys/replication/dr/primary/secondary-token)

# 3) DR Secondary を有効化（Secondary クラスターにて）
vault write sys/replication/dr/secondary/enable token="${DR_TOKEN}"

# 4) DR ステータス確認（任意のクラスターで）
vault read sys/replication/dr/status

# 5) 障害時に DR Secondary を昇格（Secondary 側で実行）
vault write -f sys/replication/dr/secondary/promote

# 6) Performance Primary を有効化（Primary クラスター）
vault write -f sys/replication/performance/primary/enable

# 7) Performance Secondary 用トークンを発行（Primary）
PERF_TOKEN=$(vault write -field=token -f sys/replication/performance/primary/secondary-token)

# 8) Performance Secondary を有効化（Secondary クラスター）
vault write sys/replication/performance/secondary/enable token="${PERF_TOKEN}"

# 9) Performance ステータス確認
vault read sys/replication/performance/status

# 10) Local マウントの例（セカンダリ上で複製しないマウントを作成）
# 例: 各拠点専用の kv マウント
vault secrets enable -path=kv-local -version=2 -local kv

Operations, Monitoring, and Change Management

Focus monitoring on replication lag, link state, and whether a reseed is required. Use the official Telemetry feed and Prometheus exporters to visualize latency and failure rates. Combining the health-check API with sys/replication/*/status on a single dashboard makes failover decisions much faster.

On the change-management side, the important pieces are the ordering of Primary rolling upgrades vs. secondaries, keeping inter-cluster connectivity intact during certificate rotation, and running regular failover drills. Take snapshots (raft snapshot for Integrated Storage) on a schedule so you have a redundant recovery path.

Lag monitoring: continuously track replicated ops/sec, queue length, and per-second delay.
Link state: watch connected, mode, and last_heartbeat from the status API.
Certificate rotation: standardize the update procedure for cluster_addr and mutual TLS.
Backup: run vault operator raft snapshot save on a schedule (Integrated Storage).
Failover drills: rehearse DR promotion and failback every quarter.

Exam Essentials and Real-World Pitfalls

Frequent exam topics: the purpose and behavioral differences between DR and Performance, which one accepts client requests (DR is standby; Performance is read-only), when to use Local mounts, and the basic promotion procedure. In real-world ops, watch for issues around mixing DR with Performance, drawing the line between Local and replicated mounts, and misjudging consistency expectations under network latency.

DR is whole-cluster protection; Performance is read scale-out. Different roles.
Performance Secondaries cannot write to replicated mounts. Local mounts are writable.
DR Secondaries do not accept normal requests; they only accept them after promotion.
Binding a secondary via an activation token is the standard pattern.
Integrated Storage is recommended for new builds. For Consul environments, validate compatibility and procedures ahead of time.
Evaluate RPO/RTO assuming asynchronous replication — do not assume zero.

Check Your Understanding

Ops

問題 1

You want globally distributed users to read secrets with low latency, while writes to replicated mounts must be funneled to a single region for consistency. Which design is most appropriate?

Configure a Primary with multiple Performance Secondaries — serve reads from each Secondary and route writes to the Primary
Deploy multiple DR Secondaries and send all reads and writes to each DR Secondary
Run every cluster active-active so they can all accept writes from each other
Use only Local mounts in each region and avoid globally shared secrets entirely

正解: A

Performance Replication is designed exactly for serving reads from each region while writes are concentrated at the Primary. DR Secondaries are standby and do not handle normal requests. Active-active mutual writes are outside Vault's replication design. A Local-only setup cannot maintain consistency for globally shared secrets.

Frequently Asked Questions

Can DR and Performance Replication run together? Is there a priority order?

Running both together is the common pattern. From the Primary, Performance Secondaries provide read scale-out, while a separate DR Secondary handles disaster recovery. Their roles are different, so think of them as complementary rather than ranked by priority.

Does Replication work with both Integrated Storage and Consul?

Both backends are supported, but Integrated Storage (Raft) is recommended for new deployments. For existing Consul environments, validate version compatibility, TLS, and network requirements in advance.

Are tokens and leases replicated?

DR Replication covers the entire cluster — tokens, leases, policies, and so on. Performance Replication is primarily about read optimization for secrets and metadata; writes to replicated mounts must go through the Primary. Keep secondary-specific data on Local mounts.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Vault Enterprise Replication Overview — Operating DR and Performance