Vault can expose internal counters, latencies, and other signals as telemetry. The two common consumers are Prometheus (pull) and StatsD (push). This article covers when to use each, common configuration pitfalls, and HA considerations.
For exam prep, the frequent topics are the main telemetry stanza options, the format selection for /v1/sys/metrics, typical Prometheus scrape configurations, and the port/protocol assumptions for StatsD.
Vault enables observability exports via the telemetry stanza in the server config. Prometheus-compatible output is available over HTTP at /v1/sys/metrics?format=prometheus, and StatsD pushes metrics over UDP.
This article assumes OSS Vault and focuses on stable options and common operational patterns. In environments with version differences, always verify the configurable keys against the documentation for the Vault version you are running.
Minimal connectivity check (fetching in Prometheus format)
curl -s https://vault.example.com:8200/v1/sys/metrics?format=prometheus \
--cacert /etc/ssl/certs/ca.pem | head -50Prometheus uses pull while StatsD uses push, so the network direction, reachability requirements, label/tag representation, and deduplication strategies all differ. When monitoring a Vault cluster, locking down how active/standby nodes are handled along with TLS and firewall design up front reduces downstream rework.
| Item | Prometheus (Pull) | StatsD (Push) | Operational impact |
|---|---|---|---|
| Collection direction | Monitoring side pulls | Vault pushes | The reachability requirement flips |
| Protocol | HTTP(S) | UDP 8125, etc. | Account for UDP loss and reordering |
| Metadata | Labels (key=value) | Tags only via extensions (DogStatsD, etc.) | Controlling high cardinality is critical |
| Duplication under HA | Suppressed by controlling scrape targets | Suppressed by controlling the sender | Requires explicit active/standby operational design |
| Dashboards | High affinity with Grafana, etc. | Depends on the aggregator (Graphite, Datadog, etc.) | Aligns with your visualization platform |
Data flow (conceptual diagram)
Typical HA target example (assuming Consul service discovery)
# Prometheus 側でサービス解決を使う場合のイメージ(抜粋)
scrape_configs:
- job_name: 'vault'
metrics_path: /v1/sys/metrics
params:
format: ['prometheus']
scheme: https
tls_config:
ca_file: /etc/prometheus/ca.pem
static_configs:
- targets: ['vault-1.example.com:8200','vault-2.example.com:8200','vault-3.example.com:8200']On the Vault side, the telemetry stanza enables emission formats and destinations. For Prometheus-only integration, a safe practical starting point is to keep the HTTP endpoint output and set the retention window (prometheus_retention_time) along with hostname-label suppression (disable_hostname).
If you also use StatsD, add statsd_address. If your observability platform (Datadog, Graphite, etc.) is already established, this keeps migration cost low.
vault.hcl excerpt: enabling Prometheus and StatsD simultaneously
telemetry {
# Prometheus 形式の保持時間(例: 24h)
prometheus_retention_time = "24h"
# ホスト名をラベルに含めないことでカーディナリティを抑制
disable_hostname = true
# StatsD 送信(必要な場合のみ有効化)
# 一般的な既定ポートは 8125/udp
statsd_address = "127.0.0.1:8125"
}
# 反映: systemd の場合
# sudo systemctl reload vaultFrom Prometheus, scrape /v1/sys/metrics with format=prometheus appended. Avoid configurations that do not terminate TLS and perform at least CA verification. In HA environments, the common approach is to target every node or resolve them dynamically via service discovery.
To dampen double-counting and failover jitter, stabilizing the stats via recording rules is a practical approach (e.g., widen the rate window slightly).
Prometheus scrape_configs (minimal example)
scrape_configs:
- job_name: 'vault'
scheme: https
metrics_path: /v1/sys/metrics
params:
format: ['prometheus']
tls_config:
ca_file: /etc/prometheus/ca.pem
# server_name を指定して SNI/証明書のCN/SANを検証
server_name: vault.example.com
static_configs:
- targets: ['vault.example.com:8200']
When you set statsd_address in telemetry, Vault sends internal metrics over UDP. Receivers include Graphite, Datadog, and Telegraf. If you rely on tag extensions like DogStatsD, choose an agent that supports them.
Because UDP assumes some loss, account for network congestion and restart spikes by stabilizing alert thresholds with hysteresis or compound conditions.
StatsD ingestion (e.g., Graphite via Telegraf)
# Telegraf の statsd 入力例(telegraf.conf 抜粋)
[[inputs.statsd]]
service_address = ":8125"
delete_gauges = false
delete_counters = false
# 必要に応じて metric_separator / templates を調整
# Vault 側(前掲 vault.hcl)で statsd_address を 127.0.0.1:8125 に設定Operationally, monitor metrics for availability, accuracy, and latency. For Prometheus, watch scrape success rate and metric-count drift; for StatsD, watch receive rate and signs of drops (receiver logs/internal metrics). This catches real problems early.
On the exam, the format selection for /v1/sys/metrics, the main telemetry keys, and the design differences between pull and push are reliable scoring opportunities.
Troubleshooting tips
# 1) Prometheus 形式で応答するか
curl -vk https://vault.example.com:8200/v1/sys/metrics?format=prometheus | head -20
# 2) ポート到達性(StatsD/UDP)
sudo tcpdump -ni any udp port 8125 -vv -c 5
# 3) ラベル過多の抑制(設定再確認)
# telemetry.disable_hostname=true を適用後、ダッシュボードの時系列を比較Ops
問題 1
Which recommended setting most directly contributes to operational stability when collecting Vault telemetry with Prometheus?
正解: A
disable_hostname=true reduces the number of metric labels (especially hostname-derived ones), which lowers TSDB load and stabilizes dashboards. prometheus_retention_time=0 can cause missing metrics. /v1/sys/health is for health checks, not metrics. StatsD and Prometheus can be used together and combined as requirements dictate.
Does Vault's /v1/sys/metrics always return data in Prometheus format?
No. The format switches based on the format=prometheus query parameter or the Accept header. From Prometheus, the reliable approach is to set metrics_path and pass format=prometheus via params.
Which nodes should be scraped in an HA configuration?
Generally, include every node as a target and design labels and aggregation on the Prometheus side to avoid duplicate series. The standard practice is to auto-register nodes via service discovery while ensuring TLS and network reachability.
Does enabling StatsD and Prometheus at the same time increase load?
There is some overhead because the export and emission paths both run, but with appropriate sampling, retention window tuning, and reliable networking, running both is practical in most environments. They can coexist depending on your observability requirements.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Vault Core Concepts: Sealed/Unsealed, Auth, Secrets (2026)
Vault fundamentals — sealed/unsealed state, auth methods, se...
Vault Operations Professional (VOP-003): Complete Guide (2026)
Pass the Vault Operations Professional exam — enterprise pat...
Vault Path-Based Routing: API URL Structure (2026)
How Vault's path-based routing works — mount points, sub-pat...
Vault Tokens: Auth Token Mechanics (2026)
Token fundamentals — service vs. batch tokens, accessor, ren...
Vault Token Types: Service, Batch, Periodic (2026)
Service vs. batch tokens compared — performance, ACL behavio...