Kafka

Exactly-Once in Kafka Streams: processing.guarantee and Production Settings

2026-04-19
NicheeLab Editorial Team

Through its transaction and state store machinery, Kafka Streams can align input offsets, internal state, and outputs into a single atomic write unit. This is what Exactly-Once Semantics (EOS) actually delivers.

This article walks through the right way to pick processing.guarantee, plus the configuration concerns at the app, broker, and topic levels. It also calls out the pitfalls that show up repeatedly on the CCDAK (Confluent Certified Developer for Apache Kafka) exam.

What Exactly-Once Means in Kafka Streams

EOS in Kafka Streams bundles the following into a single transaction per processing cycle, eliminating duplicates and inconsistencies:

1) Consumer offset commits for the inputs, 2) internal state store changes (RocksDB ⇄ changelog), and 3) record writes to output topics. On failures, restarts, or rebalances, these are either all applied or all rolled back — never partially.

  • Scope is limited to Kafka-internal side effects (output topics, internal topics, offsets). External databases are out of scope.
  • EOS is built on the idempotent producer plus transactions. In Streams you toggle it via processing.guarantee.
  • On modern versions, exactly_once_v2 is recommended. The default is often at_least_once, so it is safer to set this explicitly.

processing.guarantee: Options and Differences

Each processing.guarantee mode changes whether duplicates can occur, how offsets are committed, and the state-store consistency model. The exam specifically targets candidates who mix up the values and their semantics.

In production, exactly_once_v2 is the default choice unless you are emitting side effects directly to external systems. Switch to at_least_once only when latency or throughput requirements demand it.

  • at_least_once delivers high throughput and low latency but can produce duplicates. The application has to deduplicate.
  • exactly_once_v2 atomically commits outputs, internal state, and input offsets in the same transaction. No duplicates.
  • The setting name has changed across versions, so be explicit and use exactly_once_v2.
ModeDuplicatesOffset commitState / output consistency
at_least_oncePossible (reprocessing can duplicate)Normal group commit (outside any transaction)Non-atomic (partial application possible)
exactly_once_v2None (no duplicates within Kafka)Sent inside the transaction via SendOffsetsToTransactionState updates, outputs, and offsets all committed atomically

EOS v2 Internals and Transaction Boundaries

Under EOS v2, Streams uses one producer per stream thread and pulls output topics, the internal changelog, and the input offset commits into a single transaction. On commit, everything becomes visible atomically; on abort, everything is discarded.

When a thread is replaced after a rebalance or crash, producer fencing blocks commits from any zombie producer, keeping the system consistent.

  • Input offsets are pulled into the transaction via SendOffsetsToTransaction.
  • State store (RocksDB) changes are recorded in the changelog. Recovery replays the changelog to reconstruct state exactly.
  • Because everything becomes visible in one transaction, downstream consumers never observe a partial application.

Transaction boundaries under EOS v2 (conceptual diagram)

pollstate updatesproduceTX commitoffsets→TXTopic Ainput partitionsChangelogstate store logOutput BRocksDBtask stateTX Produceratomic commitStreams Threadpoll/process + SendOffsetsToTransaction

Configuration: App, Broker, and Topic Essentials

Start by setting processing.guarantee=exactly_once_v2 in the app, and keep application.id stable (unchanged across redeploys). Specify replication.factor as well to give internal topics proper durability.

On the broker side, give the transaction log enough replication and a sufficient min ISR. Keep the transaction timeout within the broker's upper limit.

  • App-side essentials: processing.guarantee, application.id, replication.factor for internal topics, and a sensible commit.interval.ms (a short value is safe under EOS).
  • Broker-side essentials: size transaction.state.log.replication.factor and transaction.state.log.min.isr to match the cluster.
  • Keep the producer-side transaction.timeout.ms at or below the broker's transaction.max.timeout.ms.
  • External outputs are outside EOS. When you need them, use external-side transactions or an outbox / table-driven delivery pattern.

Kafka Streams (Java) configuration example: Exactly-Once v2

Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "orders-app");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "broker1:9092,broker2:9092");
// Exactly-Once v2 を明示
props.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE_V2);
// 内部トピックの耐障害性(StreamsConfig.TOPIC_PREFIX を利用)
props.put(StreamsConfig.TOPIC_PREFIX + "replication.factor", 3);
// トランザクション関連(producer.* で上書き可能)
props.put("producer.transaction.timeout.ms", 600000); // 10 分(broker 上限以下に)
// レイテンシ/スループット調整の一例
props.put("producer.linger.ms", 5);
props.put("producer.batch.size", 32768);
// コミット間隔(EOS では短めでも一貫性は保たれる)
props.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 100);
// スレッド並列度
props.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 2);

StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> orders = builder.stream("orders");
KTable<String, Long> counts = orders.groupByKey().count();
counts.toStream().to("orders-agg");

KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();

Operational Tips and Common Pitfalls

On a graceful shutdown, you must commit or abort the current transaction before stopping. Use SIGTERM handling and close(Duration) to do this.

During a rebalance, the system has to wait for in-flight transactions to settle before tasks move. Tuning the thread count and input partitioning to keep transactions short keeps operations stable.

  • External DBs and REST calls live outside EOS. Prevent double application with external transactions, unique constraints, or the outbox pattern.
  • Long processing batches invite transaction timeouts. Control commit.interval.ms and the amount of work done per unit of record time.
  • If internal-topic replication.factor and min.insync.replicas are too low, you will see availability drops and commit failures.

CCDAK Exam Prep: Points to Remember

Choosing processing.guarantee=exactly_once_v2 commits input offsets, internal state updates, and outputs atomically in the same transaction. External systems are not included.

In Streams, you do not normally assign transactional.id yourself — it is generated and managed internally. The non-negotiable rule is to keep application.id stable.

  • at_least_once allows duplicates; exactly_once_v2 produces no duplicates within Kafka.
  • Be ready to explain that offsets are included in the transaction via SendOffsetsToTransaction.
  • Remember that the default can be at_least_once. Setting it explicitly is the safe move.

Check Your Understanding

CCDAK

問題 1

A Kafka Streams app aggregates orders and writes the results to a single output topic. You want internal state updates, outputs, and input offsets to commit atomically across restarts and rebalances, and to avoid any duplicates inside Kafka. Which combination of settings and prerequisites is correct?

  1. Set processing.guarantee=exactly_once_v2 and use a stable application.id. The broker must give the transaction log sufficient replication.
  2. Just set enable.idempotence=true (no Streams-side configuration needed).
  3. Chase low latency with acks=0 and commit.interval.ms=0 and you will not see duplicates.
  4. Compact the output topic and duplicates stop mattering, so at_least_once is fine.

正解: A

Enabling Exactly-Once requires setting processing.guarantee=exactly_once_v2 on the Streams side and keeping application.id stable. The broker must provide adequate replication and ISR for the transaction log (transaction.state.log.*). B ignores the Streams context, C has nothing to do with duplicate prevention, and D does not guarantee application-level consistency.

Frequently Asked Questions

Is exactly_once_v2 always the default?

No. In most versions the default is at_least_once. If you are designing around Exactly-Once, explicitly set processing.guarantee=exactly_once_v2.

Does EOS also make writes to external databases atomic?

No. EOS is limited to Kafka-internal effects: output topics, internal topics, and input offsets. For external databases, combine it with two-phase commit, the outbox pattern, or unique constraints / idempotent APIs on the external side.

What happens when a transaction times out?

The transaction is aborted, so the outputs, changelog updates, and offset commits in that batch never become visible. The processing is retried. Setting transaction.timeout.ms appropriately and keeping per-batch processing time short helps.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
Kafka

Kafka Topics & Partitions: Distribution Fundamentals (2026)

How Kafka topics and partitions enable scale — ordering guar...

Kafka

CCDAK Exam Guide: Confluent Certified Developer (2026)

Complete prep for the CCDAK exam — Producer/Consumer API, St...

Kafka

CCAAK Exam Guide: Confluent Certified Administrator (2026)

Pass the CCAAK exam — cluster management, partitions, securi...

Kafka

Kafka Replicas & ISR: Fault Tolerance Explained (2026)

Replica placement, in-sync replicas (ISR), leader election. ...

Kafka

Kafka Offsets: Commit Modes & Consumer Position (2026)

Offset semantics — auto vs. manual commit, __consumer_offset...

Browse all Kafka articles (101)
© 2026 NicheeLab All rights reserved.