Kafka Streams Stateful Operations: State Stores (2026)

Kafka has grown beyond messaging into a full stream processing platform. Stateful workloads in particular only run reliably once you have a solid grasp of aggregations, joins, and state stores.

This article centers on Kafka Streams and walks through the terminology, architecture, and common pitfalls, grounded in the official documentation. We also touch on what the CCDAK (Confluent Certified Developer for Apache Kafka) exam tends to ask, giving you concrete decision criteria for real-world work.

Stateful Processing Overview and Terminology

Stateful processing in Kafka holds the state required by each operation in a local state store per task, and writes every change out to a Kafka changelog topic for fault tolerance. At recovery time, the state store is rebuilt by replaying that changelog.

The main stateful operations are aggregations (count/reduce/aggregate), windowed aggregations, and joins (KStream-KStream, KStream-KTable, and KTable-KTable). For windowed processing, the time boundary (window size) and lateness tolerance (grace) determine when results are finalized. Joins require co-partitioning (same key and same partition count), with an automatic repartition step inserted when needed.

State store: held locally in RocksDB (default) or in memory
Changelog topic: store updates are written asynchronously to Kafka and replayed during recovery or scale-out
Standby task: a backup task that follows the changelog, shortening failover time
Window grace: lateness tolerance — how long late events are still accepted into past windows
Co-partitioning: a hard prerequisite for correct distributed joins and grouping

Store type	Main use cases	Retention and key structure
KeyValueStore	KTable current values and non-windowed stream aggregation results	Stores key to latest value. Changelog uses compaction by default
WindowStore	Windowed aggregations and the window buffer for stream-stream joins	Segmented by key plus time boundary. Retention is window size + grace + buffer
SessionStore	Session window aggregations (gap-based closure)	Key plus variable session boundary; boundaries shift as sessions merge

Stateful processing flow in Kafka Streams

Aggregations and Windows: count/reduce/aggregate and Late Events

Non-windowed aggregations keep the latest value in a KeyValueStore and let compaction keep the changelog efficient. Windowed aggregations keep partial aggregates per time segment in a WindowStore, and you must set the retention period generously to cover the window size plus grace.

Grace defines how far late events are still accepted. Events that arrive after the grace period elapses no longer update the window, so you need to set grace based on your business requirements and the actual arrival-delay distribution.

count: counts records — lightweight and fast when no deduplication is needed
reduce: same-type aggregation (e.g., keep the latest timestamp) — watch out for ordering sensitivity
aggregate: an initial value plus a custom accumulator — flexible, but watch the state size
Choose between TimeWindows / SlidingWindows / SessionWindows based on the use case
Set retention to window size + grace + a safety margin for processing delay

Example: windowed aggregation and joins in Kafka Streams (Java)

StreamsBuilder builder = new StreamsBuilder();

Serde<String> stringSerde = Serdes.String();
Serde<Order> orderSerde = Serdes.serdeFrom(new OrderSerializer(), new OrderDeserializer());
Serde<Customer> customerSerde = Serdes.serdeFrom(new CustomerSerializer(), new CustomerDeserializer());

KStream<String, Order> orders = builder.stream("orders", Consumed.with(stringSerde, orderSerde));
KStream<String, Payment> payments = builder.stream("payments", Consumed.with(stringSerde, Serdes.serdeFrom(new PaymentSerializer(), new PaymentDeserializer())));

// 5分タンブリング + 1分grace
TimeWindows windows = TimeWindows.ofSizeAndGrace(Duration.ofMinutes(5), Duration.ofMinutes(1));
KTable<Windowed<String>, Long> orderCount5m = orders
    .groupByKey()
    .windowedBy(windows)
    .count(Materialized.<String, Long, WindowStore<Bytes, byte[]>>as("order-count-5m")
           .withKeySerde(stringSerde)
           .withValueSerde(Serdes.Long())) ;

// KTableは現在値のスナップショットを保持
KTable<String, Customer> customers = builder.table(
    "customers",
    Consumed.with(stringSerde, customerSerde),
    Materialized.<String, Customer, KeyValueStore<Bytes, byte[]>>as("customers-store")
        .withKeySerde(stringSerde).withValueSerde(customerSerde)
);

// KStream-KTable join(同一キー)。コーパーティションが必要。必要に応じてリパーティションが自動作成される
KStream<String, EnrichedOrder> enriched = orders.join(
    customers,
    (order, customer) -> EnrichedOrder.of(order, customer),
    Joined.with(stringSerde, orderSerde, customerSerde)
);

// KStream-KStreamの時間制約付きJoin
KStream<String, JoinedEvent> joined = orders.join(
    payments,
    (o, p) -> JoinedEvent.of(o, p),
    JoinWindows.ofTimeDifferenceWithNoGrace(Duration.ofMinutes(5)),
    StreamJoined.with(stringSerde, orderSerde, Serdes.serdeFrom(new PaymentSerializer(), new PaymentDeserializer()))
);

KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();

Join Patterns: Stream-Stream, Stream-Table, and Table-Table

KStream-KStream joins require a time constraint: records from both sides must meet within the window. Internally, a WindowStore acts as a temporary buffer for each side, and late-event handling is governed by grace.

KStream-KTable joins enrich each stream event with the KTable snapshot at the moment it arrives. The KTable side is read via a KeyValueStore for current values only — no window required. KTable-KTable joins propagate updates from both sides to the result table. Extensions like foreign-key joins depend on the version, so check the official documentation for support before you build.

Common to every join: matching keys and co-partitioning (same key, same partition count)
KStream-KStream: JoinWindows are mandatory; both sides are buffered in WindowStores
KStream-KTable: only one side reads from state; the stream side triggers updates
KTable-KTable: updates propagate in both directions; changelog compaction keeps it efficient
An automatic repartition (an intermediate topic) may be inserted when needed

State Stores and Recovery: RocksDB, Changelogs, and Standbys

Kafka Streams uses RocksDB as the local store by default. Writes are batched at commit boundaries, and updates are replicated asynchronously to the changelog topic. When tasks are reassigned or a node fails, RocksDB is rebuilt by replaying the changelog.

For high availability, you can enable standby tasks within the same application group. Standbys follow the changelog in parallel, enabling a warm start during failover. The optimal retention and compaction settings differ between table-style and window-style stores.

RocksDB: large-capacity, disk-resident; enable withCachingEnabled to improve hit rates
Changelog: compaction for table-style stores; retention-based deletion for window-style stores
More standby tasks shorten recovery time but increase network and storage load
Estimate recovery time from changelog size, latency targets, and available I/O bandwidth

Partitioning and Repartition: Conditions for Co-partitioning

When you change the key before grouping or joining (via map, flatMap, selectKey, and similar operators), the existing partition layout becomes inconsistent. Kafka Streams then inserts an intermediate topic to repartition automatically. The exam frequently asks about the prerequisites for co-partitioning and the typical cases that trigger automatic repartitioning.

In production, you keep intermediate topics from multiplying by setting keys explicitly at the source, matching partition counts when creating topics, and applying selectKey only where it is truly needed.

Co-partitioning prerequisites: same partition count, same key, same partitioner
Changing the key with selectKey or groupBy can trigger an intermediate topic for repartitioning
For KStream-KTable joins, the KTable's key must match the join key
Adding partitions after the fact can break co-partitioning for existing tables

Operations and Exam Essentials: EOS, Commits, and Window Finalization

Exactly-once processing is achieved by combining the transactional producer with processing.guarantee=exactly_once_v2. Atomic commits of state updates and outputs prevent double counting. The commit interval is a trade-off between latency and write load.

When a window result is finalized depends on stream-time advancement and the elapsed grace period. If you need to emit only finalized results downstream, you typically add suppression-equivalent control in ksqlDB or in your application. The exam frequently covers the meaning of grace, late-event handling, and EOS prerequisites.

Enable EOS by setting processing.guarantee=exactly_once_v2
commit.interval.ms affects RocksDB flushes and output batching
Window results may keep updating until the grace period elapses
The crux in production: set grace and retention based on the actual distribution of late events

Check Your Understanding

CCDAK

問題 1

In an application that joins a KStream with a KTable, you change the key of orders (KStream) with selectKey and immediately join it with customers (KTable). Which design is most appropriate?

After selectKey on orders, rely on the auto-generated repartition topic to establish co-partitioning, and ensure its key matches the customers key
After selectKey on orders, specify JoinWindows so the time-bounded join lets you join even KStream-KTable through a window
Even if the keys on the customers side do not match, KStream-KTable scans by arrival order so the join still works
The partition counts of orders and customers are irrelevant; Kafka Streams broadcasts internally to perform the join

正解: A

KStream-KTable joins require both matching join keys and co-partitioning. When you change the key with selectKey, an intermediate repartition topic is inserted as needed, and the join works as long as that topic's key matches the customers key. KStream-KTable joins do not use JoinWindows (no time bound), and Kafka Streams does not broadcast to perform the join.

Frequently Asked Questions

Why is compaction recommended for KTable changelog topics?

A KTable represents the current value for each key, so older updates carry no value. Compaction keeps only the latest value per key, which shrinks the replay volume needed at recovery and reduces storage consumption.

What changes when you use an in-memory store instead of RocksDB?

You get lower latency, but capacity limits prevent you from holding large state. Recovery still requires rebuilding from the changelog after a restart, and you need to account for warm-up time and GC impact.

What happens if you set the window grace period to 0?

Late events are dropped, and the window is essentially finalized the moment its boundary passes. For sources with frequent lateness, this raises the risk of missing results, so set grace based on measured arrival delays.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Stateful Processing in Kafka: Aggregations, Joins, State Stores, and CCDAK Prep