State management is the core of stateful operations in Kafka Streams: aggregations, joins, and windowing. By default, RocksDB is embedded locally, and failures are recovered from changelog topics.
This article walks through the fundamentals of RocksDB, the restart recovery flow, standby replicas, operational tuning, and the points most often tested on the CCDAK exam, all from a practical standpoint.
Kafka Streams stores intermediate computation results in a State Store. A materialized State Store is backed by a changelog topic of the same name, which can be replayed to recover after failure.
The default persistent store is RocksDB, embedded in the process. An on-heap memory store is also available, but its contents disappear when the process exits, so a restart requires replaying the entire changelog. Because RocksDB persists to local disk, recovery is fast when the checkpoint matches — only the delta needs to be applied.
| Store type | Durability | Recovery on restart | Performance / I-O |
|---|---|---|---|
| In-memory (on-heap) | None (lost when the process exits) | Full replay of changelog from the beginning | Low latency, but affected by GC |
| RocksDB (default) | Yes (local disk) | Can apply only the delta since the checkpoint | Disk I/O with caching delivers low latency |
| RocksDB + standby | Yes (warmed across multiple nodes) | On failover, the standby is promoted and the delta is minimal | Extra network and storage overhead |
When the process restarts, each task inspects the RocksDB and checkpoint file (the changelog position already applied) under the local state.dir. If local state exists and the checkpoint matches the task assignment, only the changelog delta is read to restore consistency before processing resumes.
If local state is lost or the checkpoint is inconsistent, recovery falls back to a full replay of the store's changelog from the beginning. With Exactly-once v2 enabled, recovery stops at transaction boundaries, preserving consistency between stores and output topics. In clusters with cooperative rebalancing (sticky placement of stateful tasks), tasks tend to stay on the same host, so the reuse rate of local state goes up.
Recovery path for a RocksDB store
Each State Store writes key-value updates to its changelog topic. Log compaction keeps the latest update per key and discards older versions. Deletions are written as tombstones (key only, value null), so deletes are correctly replayed during recovery.
For accurate recovery, it is critical to use the same serializer/deserializer (Serde) for the store and its changelog, and to keep the replication factor of internal topics high enough. RocksDB-side flush timing and caching do not affect recovery logic — recovery is always applied in changelog order.
Setting num.standby.replicas keeps each partition's State Store warm in the background on other nodes. On failover, the standby is promoted to active and recovery finishes by applying only the delta. The trade-off is extra network and storage cost.
Cooperative rebalancing uses sticky placement for stateful tasks, so tasks tend to stay on the same host even during scale-in/out events, and reusing the local state.dir keeps recovery cost down. In container deployments, putting state.dir on a persistent volume delivers the same benefit.
Put state.dir on a local persistent disk with enough I/O and capacity, and in containers mount it from a host volume. The file descriptor limit and I/O scheduler settings matter as well.
RocksDB lets you tune write buffers and the block cache through RocksDBConfigSetter. The cache (cache.max.bytes.buffering) and commit interval (commit.interval.ms) trade off latency against changelog write volume. Recovery throughput is dominated by network and disk I/O while consuming the changelog, so revisit consumer settings such as max.partition.fetch.bytes too.
Exactly-once v2 (processing.guarantee=exactly_once_v2) is recommended. Set the replication factor of internal topics (changelog/repartition) to 3 to prevent data loss during failures.
Configuration example (Java Properties + Streams DSL)
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "payments-app");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka:9092");
// 永続ディスク配下に配置
props.put(StreamsConfig.STATE_DIR_CONFIG, "/var/lib/kafka-streams/state");
// 復旧短縮(コストとトレードオフ)
props.put(StreamsConfig.NUM_STANDBY_REPLICAS_CONFIG, 1);
// Exactly-once v2
props.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE_V2);
// キャッシュとコミット間隔
props.put(StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG, 256 * 1024 * 1024L); // 256MB
props.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 500);
// 内部トピックのレプリケーション(本番は 3 推奨)
props.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, 3);
// 復旧スループットに影響
props.put(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, 4 * 1024 * 1024); // 4MB
StreamsBuilder builder = new StreamsBuilder();
KStream<String, Payment> payments = builder.stream("payments");
KTable<String, Long> counts = payments
.groupByKey()
.count(Materialized.as("payments-counts")); // デフォルトで RocksDB + changelog
KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();Accidentally deleting the local state.dir forces a full recovery (a complete changelog replay), which can take a long time. To shorten the recovery window, combine standby replicas, persistent volumes, and cooperative rebalancing.
When a checkpoint is broken, the logs record a recovery fallback and the system switches to replay from the beginning. Once you decide consistency cannot be restored, the safest move is to delete the affected task's local directory and let it regenerate.
Topology changes that rename a store or alter its Serde break compatibility with the existing changelog and cause recovery to fail. When a safe reset is required, use the application reset tool to clean up input offsets and internal topics before redeploying.
CCDAK
問題 1
A Kafka Streams app crashes and is restarted immediately on the same host. The local RocksDB state and checkpoint are healthy, processing.guarantee is exactly_once_v2, and num.standby.replicas=0. Which statement best describes the recovery at restart?
正解: A
When local RocksDB state and the checkpoint are both healthy, Streams applies only the changelog delta for an incremental recovery and resumes processing. The changelog is the source of truth for recovery; input topic offsets are not used. Recovery is possible without standbys, but if local state is missing, a full replay from the beginning is required.
How does switching to an in-memory store change recovery?
Because state is lost when the process exits, restart always means replaying the changelog from the beginning. Disk I/O drops, but recovery typically takes longer. For production workloads that prioritize resilience, RocksDB is recommended.
What happens if the checkpoint is corrupted or inconsistent?
Streams plays it safe and falls back to a full recovery from the beginning. If the issue keeps recurring, delete the affected task's state.dir contents to regenerate, then verify disk health and Serde compatibility.
How can you speed up recovery in container deployments?
Put state.dir on a persistent volume, use cooperative rebalancing to keep task stickiness, set num.standby.replicas to 1, set internal topic replication to 3, and ensure adequate I/O bandwidth at bootstrap time.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...