Vault's DR Replication is an asynchronous, one-way replication feature designed to keep the business running during a disaster. Under normal conditions the secondary stays on standby, and during an outage it is promoted to operate as the primary.
Based on stable behavior from the official documentation, this article organizes design fundamentals, setup procedures, failover/failback, monitoring and drills, and a DR vs. Performance comparison — covering both certification prep and real-world operations.
Vault DR Replication asynchronously replicates the same dataset to a different region or cluster. The secondary stays on standby in normal operation and is promoted during an outage to resume service. The secondary does not serve client requests until it is promoted.
From an Ops perspective, the key items are the roles (DR Primary / DR Secondary), promote/demote operations, RPO/RTO thinking, TLS and certificate prerequisites, and the difference from Performance Replication. The exam frequently tests these conceptual distinctions.
Minimal commands to check status (preparation phase)
export VAULT_ADDR=https://primary.example.com:8200
export VAULT_TOKEN=<admin_or_appropriate_token>
# Check DR status (on Primary)
vault read sys/replication/dr/status
# Check server state
vault statusIn production, the DR secondary is placed across regions or data centers. The network must allow bidirectional TLS connectivity, and the API and cluster addresses must match the SANs in the certificates.
Using Integrated Storage (Raft) simplifies the setup without depending on an external K/V store. The keys are intra-cluster communication on 8201 (cluster_addr) and correctly configuring 8200 (api_addr) for clients.
DR Replication (inter-region, Raft)
server.hcl (excerpt, example of stable parameters)
storage "raft" {
path = "/opt/vault/data"
node_id = "vault-node-1"
}
api_addr = "https://vault-a.example.com:8200"
cluster_addr = "https://vault-a.example.com:8201"
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = 0
tls_cert_file = "/opt/vault/tls/tls.crt"
tls_key_file = "/opt/vault/tls/tls.key"
}Below is the minimal procedure. Run it after verifying TLS, policies, audit logging, and storage health (Raft/Consul). A dual-control admin approval workflow makes it safer.
Generate the secondary activation token on the primary and use it on the secondary for activation. Treat the token as short-lived and tightly scoped.
Example commands to enable DR Replication
# Primary side
export VAULT_ADDR=https://primary.example.com:8200
export VAULT_TOKEN=<admin_token>
# 1) Enable DR Primary
vault write -f sys/replication/dr/primary/enable
# 2) Generate Secondary activation token
DR_TOKEN=$(vault write -field=token -f sys/replication/dr/primary/secondary-token)
# Secondary side
export VAULT_ADDR=https://secondary.example.com:8200
export VAULT_TOKEN=<admin_token_on_secondary>
# 3) Enable DR Secondary (set primary_api_addr per your environment)
vault write sys/replication/dr/secondary/enable \
token="$DR_TOKEN" \
primary_api_addr="https://primary.example.com:8200"
# 4) Check status (on both sides)
vault read sys/replication/dr/statusFor a planned failover, confirm that the DR secondary is fully caught up, then promote the secondary and switch traffic. The basic flow is the same for emergency failover, but RPO may take a hit equivalent to network latency.
For failback, the safe operational pattern is to rejoin the old primary as a secondary under the new primary. Rather than a simple “reset to original”, follow the demote and rejoin procedure.
Representative commands for failover and failback
# 1) Planned failover (promote Secondary)
export VAULT_ADDR=https://secondary.example.com:8200
export VAULT_TOKEN=<admin_token_on_secondary>
# Promote
vault write -f sys/replication/dr/secondary/promote
# 2) Demote the old primary to prepare for rejoin
export VAULT_ADDR=https://old-primary.example.com:8200
export VAULT_TOKEN=<admin_token_on_old_primary>
# Demote
yes | vault write -f sys/replication/dr/primary/demote
# 3) Reissue secondary token on the new primary, update on the old primary
export VAULT_ADDR=https://new-primary.example.com:8200
export VAULT_TOKEN=<admin_token_on_new_primary>
NEW_TOKEN=$(vault write -field=token -f sys/replication/dr/primary/secondary-token)
export VAULT_ADDR=https://old-primary.example.com:8200
export VAULT_TOKEN=<admin_token_on_old_primary>
# Update to point at the new primary (use update-primary, not enable)
vault write sys/replication/dr/secondary/update-primary \
token="$NEW_TOKEN" \
primary_api_addr="https://new-primary.example.com:8200"
# 4) Check status on both sides
vault read sys/replication/dr/statusMonitoring centers on replication health (status, lag, connection errors) and the health of certificates and time synchronization. Collect Prometheus-format metrics and alert on thresholds.
For drills, regularly run planned failover → failback to measure actual RTO and keep runbooks fresh. Snapshots do not replace DR, but they are useful for incident analysis and worst-case recovery.
Practical monitoring and backup snippets
# DR status (JSON output)
vault read -format=json sys/replication/dr/status | jq .
# Prometheus metrics (set appropriate token/headers)
curl -s -H "X-Vault-Token: $VAULT_TOKEN" \
"$VAULT_ADDR/v1/sys/metrics?format=prometheus" | grep replication
# Raft snapshot (supplementary backup)
vault operator raft snapshot save /backup/vault-$(date +%Y%m%d%H%M%S).snapDR is a standby system for disaster recovery; Performance is a distributed system for read scale. Because the design goals differ, client availability, write capability, and cutover operations are fundamentally different.
The exam often asks whether the secondary can serve requests during normal operation and which operation is the promotion. The table below clarifies these differences.
| Item | DR Replication | Performance Replication | Single Cluster (reference) |
|---|---|---|---|
| Purpose | Disaster recovery (regional-failure protection) | Read scale / geographic distribution | Availability (within a single site) |
| Client handling in normal operation | Not possible (standby, blocked until promotion) | Possible (primarily serves reads) | Possible (as usual) |
| Writes | Not possible (allowed after promotion) | Proxied to the primary | Possible |
| Cutover operation | secondary/promote and routing cutover | Not needed (topology preserved) | Not needed |
| RPO/RTO tendency | Asynchronous, so RPO>0, RTO = promotion + cutover | RPO>0 (asynchronous) | RPO/RTO depend on intra-site HA |
(Reference) Minimal example of enabling Performance Replication
# Enable Performance Primary on the primary cluster
vault write -f sys/replication/performance/primary/enable
# Generate a token for the Secondary
P_TOKEN=$(vault write -field=token -f sys/replication/performance/primary/secondary-token)
# Enable Performance Secondary on a separate cluster
vault write sys/replication/performance/secondary/enable \
token="$P_TOKEN" \
primary_api_addr="https://primary.example.com:8200"Ops
問題 1
In a Vault Enterprise DR Replication setup, which procedure is most appropriate for performing a planned failover with minimal data loss?
正解: A
The correct path for a planned failover is to confirm that the DR secondary is caught up, promote it via secondary/promote, and switch routing. Force-stopping the primary with seal is not a recommended procedure and may worsen RPO/RTO. Disabling DR or newly enabling Performance does not match the goal.
Can a DR secondary serve read-only traffic during normal operation?
No. A DR secondary stays in standby until it is promoted and does not handle client requests in principle. If your goal is read scale, consider Performance Replication instead.
How do I perform a failback?
Issue a secondary token on the new primary, demote the old primary, then re-join it as a secondary under the new primary using sys/replication/dr/secondary/update-primary. Rather than directly “reverting”, follow the promote/demote procedure for safety.
Can backups (Raft snapshots) replace DR?
No. DR aims for low-RTO service continuity, while snapshots are a supplement for worst-case recovery, auditing, and verification. Use both: DR for business continuity, snapshots for an additional safety net.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Vault Core Concepts: Sealed/Unsealed, Auth, Secrets (2026)
Vault fundamentals — sealed/unsealed state, auth methods, se...
Vault Operations Professional (VOP-003): Complete Guide (2026)
Pass the Vault Operations Professional exam — enterprise pat...
Vault Path-Based Routing: API URL Structure (2026)
How Vault's path-based routing works — mount points, sub-pat...
Vault Tokens: Auth Token Mechanics (2026)
Token fundamentals — service vs. batch tokens, accessor, ren...
Vault Token Types: Service, Batch, Periodic (2026)
Service vs. batch tokens compared — performance, ACL behavio...