Vault's Integrated Storage is a Raft-based built-in storage engine that delivers HA without any external key-value store. For small to mid-sized and standard deployments, it is typically the first choice.
From an Ops perspective, the most frequently tested topics are odd-node placement, consistent address configuration (api_addr / cluster_addr), snapshot operations, and the quorum recovery flow during failures.
Integrated Storage (Raft) is a distributed log replication engine embedded directly in the Vault process. The leader serializes writes and replicates them to followers to guarantee consistency. Because it doesn't depend on external systems like Consul, it has fewer moving parts and is straightforward to deploy.
Raft is the right pick when you want to minimize external dependencies, need standard availability (3 or 5 nodes), and can rely on low-latency communication inside a data center. If you already run Consul as a general-purpose KV, the Consul backend is also a candidate, but for a greenfield Vault-only deployment, Raft is operationally simpler.
The baseline is 3 nodes spread across multiple AZs within the same region. Five nodes offer higher availability but raise both cost and latency. The control plane (cluster traffic on 8201/TCP) must be mutually reachable between nodes, while the data plane (API on 8200/TCP) must be reachable from clients and load balancers.
Quorum requires 2 nodes in a 3-node cluster and 3 nodes in a 5-node cluster. Run the same Vault version on every node and make NTP time sync and TLS mandatory. It is critical not to confuse api_addr (used by clients) with cluster_addr (used between Raft peers).
3-node Raft cluster (3 AZs, accessed via LB)
Configuration in server.hcl defines api_addr / cluster_addr, the listener (8200/8201, TLS), and storage "raft" (path, node_id, retry_join). Initialize the first node with vault operator init, then have subsequent nodes join via vault operator raft join.
After joining, unseal the node and confirm that all peers are present. It is safest to take snapshots from the leader.
Vault server.hcl (Raft) and example initialization commands
# /etc/vault.d/server.hcl
ui = true
api_addr = "https://vault-1.example.com:8200"
cluster_addr= "https://vault-1.example.com:8201"
listener "tcp" {
address = "0.0.0.0:8200"
tls_disable = 0
tls_cert_file = "/etc/vault.d/certs/vault.crt"
tls_key_file = "/etc/vault.d/certs/vault.key"
}
storage "raft" {
path = "/opt/vault/data"
node_id = "vault-1"
# 複数エントリ可。いずれかのリーダーに到達できれば自動参加を試行
retry_join {
leader_api_addr = "https://vault-1.example.com:8200"
# tls_servername や ca_cert_file などを必要に応じて指定
}
retry_join {
leader_api_addr = "https://vault-2.example.com:8200"
}
}
# 初期化(1台目)
$ export VAULT_ADDR=https://vault-1.example.com:8200
$ export VAULT_CACERT=/etc/vault.d/certs/ca.crt
$ vault operator init -key-shares=5 -key-threshold=3 > init.txt
$ vault operator unseal # しきい値回数実施
# 2台目以降(例: vault-2)
$ export VAULT_ADDR=https://vault-2.example.com:8200
$ vault operator raft join https://vault-1.example.com:8200
$ vault operator unseal
# ピア確認(どのノードでも)
$ vault operator raft list-peersRaft is a strongly consistent leader-based protocol. Writes are accepted by the leader, replicated to a majority of nodes, and then committed. Standbys forward requests to the leader, so the latency between the leader and each peer has a direct impact on throughput and response time.
Snapshots prevent log growth and shorten recovery time on restart or rejoin. Choose high-IOPS, low-latency persistent disks, and size CPU and memory to match the workload (token issuance and cryptographic operations).
Back up the cluster using Raft snapshots. Capture them from the leader as a rule and version them in secure storage. Restore against a new node or a cleaned data directory.
Perform upgrades one node at a time in a rolling fashion, always preserving quorum. Stop the node, upgrade it, start it back up, confirm it has stabilized, and only then move on to the next. Follow the compatibility notes in the release notes.
If you need to remove a node due to quorum loss or hardware failure, review the peer list, remove the affected node, and if necessary, rebuild from a snapshot.
Raft Integrated Storage delivers HA with no external dependencies, but if you want to leverage existing Consul operations or share a general-purpose KV, the Consul backend is also reasonable. The File backend is for single-node use and is unsuitable for HA.
The exam likes to test the odd-node principle, confusion between api_addr and cluster_addr, where to take snapshots, and how to handle quorum loss (peer removal and rejoin).
| Item | Raft Integrated Storage | Consul Backend | File Backend |
|---|---|---|---|
| HA support | Yes (built-in Raft for leader election and replication) | Yes (uses Consul sessions and locks) | No (single-node only) |
| External dependencies | None (no extra middleware) | Required (a Consul cluster) | None |
| Operational complexity | Low (few moving parts) | Medium to high (separate Consul operations) | Low (but no HA) |
| Backup | vault operator raft snapshot | consul snapshot, etc. | File copy (with consistency caveats) |
| Best-fit scenario | Vault-only deployments wanting simple, standard HA | Leveraging existing Consul / integrating with a general-purpose KV | Lightweight use for testing or single-node scenarios |
Enforce TLS on both Raft peer traffic and the API, and include the FQDNs or IPs of api_addr and cluster_addr in the certificate SANs. If TLS verification fails, join and replication become unstable.
Stored data is encrypted by Vault's storage barrier. Snapshots also contain sensitive material, so restrict snapshot permissions tightly and ensure the storage location is encrypted and access-controlled. Combine the health endpoint with telemetry for continuous monitoring of peer count, leader transitions, and replication lag.
Ops
問題 1
You want to achieve high availability (HA) on Vault OSS without adding any external middleware. Which configuration is the standard, recommended choice?
正解: A
The recommended way to achieve HA simply is Integrated Storage (Raft) on an odd number of nodes (typically 3). The LB targets the API (8200), while peers must reach each other on 8201. File doesn't support HA, 2-node setups risk losing quorum, and closing 8201 breaks the cluster.
What is the difference between api_addr and cluster_addr?
api_addr is the endpoint that clients and load balancers connect to, while cluster_addr is the address used for inter-node (Raft peer) communication between Vault nodes. Both must be reachable and covered by the certificate SANs.
Where should snapshots be taken and restored?
As a rule, take and restore snapshots on the leader node. Use vault operator raft snapshot save to capture snapshots, and to restore, stop the node, clean the data directory, then run vault operator raft snapshot restore.
How do you handle a node that won't rejoin after a failure?
Check peers with vault operator raft list-peers, and for permanent failures, evict the node from the cluster with remove-peer. Then bring up a fresh node with a clean data directory and rejoin via raft join. Always perform these steps in an order that preserves quorum.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Vault Core Concepts: Sealed/Unsealed, Auth, Secrets (2026)
Vault fundamentals — sealed/unsealed state, auth methods, se...
Vault Operations Professional (VOP-003): Complete Guide (2026)
Pass the Vault Operations Professional exam — enterprise pat...
Vault Path-Based Routing: API URL Structure (2026)
How Vault's path-based routing works — mount points, sub-pat...
Vault Tokens: Auth Token Mechanics (2026)
Token fundamentals — service vs. batch tokens, accessor, ren...
Vault Token Types: Service, Batch, Periodic (2026)
Service vs. batch tokens compared — performance, ACL behavio...