This article walks through the concrete procedure for promoting a DR secondary to primary in a HashiCorp Vault Enterprise Disaster Recovery (DR) replication environment, balancing real-world ops practice with exam-relevant patterns.
All commands use the stable endpoints documented officially, and we cover the decision criteria for avoiding split-brain during a partition together with the practical points of LB/DNS cutover.
Vault Enterprise offers Performance Replication and DR Replication. DR keeps a standby cluster for disaster recovery and, on failure, promotes the secondary into the primary role. During normal operation the DR secondary is a pure standby that does not even accept reads — it only activates at cutover.
DR promotion is invoked explicitly against the secondary via API/CLI. Around the cutover, update client endpoints (LB/DNS) and network-isolate the former primary before re-joining it — that is the safe pattern.
| Aspect | DR Replication | Performance Replication |
|---|---|---|
| Primary purpose | Disaster-time primary substitute (standby) | Read scale-out / geo-distribution (used in steady state) |
| Steady-state I/O | Essentially no client I/O (standby) | Reads served on the secondary (with some limits) |
| Cutover op | Explicitly run promote on the secondary | No promote concept (primary/secondary roles are fixed) |
| RPO profile | Near zero (follows WAL; lag during network interruptions) | Replicas serve in steady state; RPO depends on use case |
| Real-world RTO | On the order of a few minutes (including verification and LB cutover) | No cutover needed (steady-state operation) |
DR topology (before cutover)
In an unplanned failover, the most dangerous failure mode is a partial recovery of the old primary. Before promoting the DR side, always verify DR-side health, replication lag, and reachability of the old primary.
Automate the checks via JSON output so the decision does not depend on individual operators.
This is the standard flow for safely cutting over with minimum downtime. Automating LB/DNS changes ahead of time stabilizes recovery times.
Run the promotion operation on the leader node of the DR secondary cluster.
Runbook (promotion on the DR secondary)
# Point VAULT_ADDR and VAULT_TOKEN at the leader of the DR secondary first
set -euo pipefail
# 0) Remove the old primary from the LB (delegated to LB-side automation)
# ... (invoke the LB/DNS automation script)
# 1) Verify DR secondary health
vault status
vault operator raft list-peers || true # Not needed for Consul HA
# 2) Check DR status (JSON for machine-readable health)
vault read -format=json sys/replication/dr/status | jq .
# 3) Run promotion (valid only on a secondary)
vault write -f sys/replication/dr/secondary/promote
# 4) Wait for state to settle and re-check
sleep 3
vault read sys/replication/dr/status
vault status # Confirm HA Mode: active / Cluster Mode: primary, etc.
# 5) Cut LB/DNS over to the DR side (probe via /v1/sys/health)
# Example: active node returns 200, standby 429, sealed 503 by default (tunable per environment)
# 6) Smoke-test the key paths (e.g. auth, KV read, Transit sign)
# Run vault login, vault kv get, vault write transit/sign, etc.
Once the DR side has become primary, verify client reconnection and the continuity of secrets operations. Pay particular attention to token/lease expiry timing and the behavior of background renewals.
Use /sys/health for health checks. Assuming the default behavior of 200 for active and 429 for standby keeps the cutover logic simple.
When the old primary comes back, do not immediately return it to the network. First inspect its state in an isolated environment. The safe pattern is to start from the new primary (the promoted DR side), issue a fresh DR secondary token, and re-register the old primary as a secondary.
Because the exact re-join steps and API paths differ by version, follow the official documentation procedure for your specific Vault version strictly. If data divergence is suspected, prioritize a snapshot / re-sync plan over a quick re-join.
Ops-focused exams frequently test the correct endpoint, which node executes the operation, and split-brain countermeasures. Real operations also tend to fail at exactly those points.
Two facts in particular are worth committing to memory: DR promotion runs on the DR secondary side, and the old primary must be reliably isolated before cutover.
Ops
問題 1
In a Vault Enterprise DR replication environment, which operation correctly promotes the DR secondary to primary during a failover?
正解: A
DR promotion is executed by running sys/replication/dr/secondary/promote on the DR secondary side. There is no raft promote, the path is not under performance replication, and unseal has no promote option.
Do existing tokens and leases expire after promotion?
Because DR replication mirrors cluster state, tokens and leases do not all expire at the instant of promotion. Existing tokens and leases continue to honor their remaining TTL. However, anything that expired during the outage will not be revived. Also verify that each secrets engine's backend remains reachable from the new primary.
How can we minimize downtime?
Automate LB/DNS switching, use /sys/health for unambiguous health detection, and rehearse smoke tests beforehand. Strictly follow the order: detach the old primary → promote the DR secondary → expose the new destination. Make sure smoke tests over key paths (auth, KV, Transit) can complete in 1-2 minutes.
Who can execute the promotion?
You need a token with sudo capability on sys/replication/dr/secondary/promote. In practice, define a dedicated ops policy and execute the promotion through an auditable emergency runbook. Avoid using root tokens as a routine.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Vault Core Concepts: Sealed/Unsealed, Auth, Secrets (2026)
Vault fundamentals — sealed/unsealed state, auth methods, se...
Vault Operations Professional (VOP-003): Complete Guide (2026)
Pass the Vault Operations Professional exam — enterprise pat...
Vault Path-Based Routing: API URL Structure (2026)
How Vault's path-based routing works — mount points, sub-pat...
Vault Tokens: Auth Token Mechanics (2026)
Token fundamentals — service vs. batch tokens, accessor, ren...
Vault Token Types: Service, Batch, Periodic (2026)
Service vs. batch tokens compared — performance, ACL behavio...