Kafka

CCDAK Prep: Implementing Kafka Exactly-Once Semantics in Producer, Consumer, and Streams

2026-04-19
NicheeLab Editorial Team

Exactly-Once in Kafka is not simply about "no duplicates". It only holds when you combine producer idempotence and transactions, consumer read isolation (read_committed), and — where needed — Kafka Streams state management.

Based on the stable features in the official documentation, this article walks through how to configure EOS in each of the Producer, Consumer, and Streams layers, and where the pitfalls are. It includes a comparison table, diagrams, sample code, and practice questions for exam prep.

Defining Exactly-Once and the Underlying Assumptions

Exactly-Once Semantics (EOS) in Kafka keeps message deduplication and offset management consistent so that, even across retries and failures, one input record produces exactly one output record. Producer idempotence (enable.idempotence) alone is not enough — the key is transactions that bundle reads, writes, and offset commits into a single atomic unit.

EOS applies inside Kafka. End-to-end Exactly-Once to an external database or SaaS requires the downstream system's own transactions or idempotency-key design (e.g., the Outbox pattern). The foundation for EOS inside Kafka is a Transactional Producer plus a read_committed Consumer, or Kafka Streams with processing.guarantee=exactly_once_v2.

  • Idempotence: prevents duplicate writes caused by retries within the same partition
  • Transactions: atomically protect produced records together with offset commits
  • Read isolation: the Consumer uses read_committed to hide records from aborted transactions
  • Kafka Streams: delivers end-to-end EOS through internal topics and state stores

EOS on the Producer: Choosing Between Idempotence and Transactions

An idempotent producer (enable.idempotence=true) prevents duplicate production within a single partition. It is effective on retries and during transient broker failures, but it knows nothing about the consumer-side offset management. In other words, idempotence alone cannot make consume → transform → produce → offset-update atomic.

This is where the Transactional Producer comes in. You set transactional.id and, between beginTransaction and commitTransaction, call send (to write output) and sendOffsetsToTransaction (to include consumed offsets in the same transaction). If anything fails, the output and offsets are either committed together or aborted together, which is what makes Exactly-Once work. Transaction coordination is handled by the broker-side Transaction Coordinator, and fencing on a given transactional.id eliminates zombie producers.

  • Required/recommended settings: enable.idempotence=true, acks=all, retries>0, max.in.flight.requests.per.connection<=5, transactional.id=a stable fixed value
  • Broker durability: combine min.insync.replicas>=2 with acks=all to ensure fault tolerance
  • Transaction timeout: keep producer.transaction.timeout.ms at or below the broker's transaction.max.timeout.ms
  • On errors: when ProducerFencedException or OutOfOrderSequenceException is thrown, roll back via abortTransaction

Atomic Consume-Transform-Produce via a Transactional Producer

poll()beginTransactionsend(records)sendOffsetsToTransactioncommitTransactionConsumerread_committedTransaction CoordinatorProducertxn.id=X, idempotent=trueTopic Partitions__consumer_offsets

Java: Including offsets in the same transaction with a Transactional Producer

Properties p = new Properties();
p.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker:9092");
p.put(ProducerConfig.ACKS_CONFIG, "all");
p.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true");
p.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "5");
p.put(ProducerConfig.RETRIES_CONFIG, Integer.toString(Integer.MAX_VALUE));
p.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
p.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
p.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "orders-tx-1");

KafkaProducer<String, String> producer = new KafkaProducer<>(p);
producer.initTransactions();

KafkaConsumer<String, String> consumer = /* read_committed で初期化 */ null;

try {
  ConsumerRecords<String, String> records = consumer.poll(Duration.ofSeconds(1));
  if (!records.isEmpty()) {
    producer.beginTransaction();
    for (ConsumerRecord<String, String> r : records) {
      // 変換して出力
      ProducerRecord<String, String> out = new ProducerRecord<>("out-topic", r.key(), transform(r.value()));
      producer.send(out);
    }
    // 読み取ったオフセットを同一トランザクションでコミット
    Map<TopicPartition, OffsetAndMetadata> offsets = currentOffsets(records);
    producer.sendOffsetsToTransaction(offsets, new ConsumerGroupMetadata("cg-transform"));
    producer.commitTransaction();
  }
} catch (ProducerFencedException | OutOfOrderSequenceException e) {
  // フェンシングや順序違反: トランザクションを中断
  producer.abortTransaction();
  throw e;
} catch (Exception e) {
  producer.abortTransaction();
  throw e;
}

Consumer-Side Essentials: read_committed and Offset Management

For a Consumer to benefit from Exactly-Once, you must set isolation.level=read_committed so that records inside aborted transactions become invisible (the default is read_uncommitted). This prevents the Consumer from ingesting inconsistent records when a Transactional Producer aborts.

If you want EOS for a simple ingest pipeline (read and then write to an external system), Kafka-internal EOS is not enough unless the external system shares the same transaction. For consume-transform-produce pipelines that stay inside Kafka, the standard play is to combine a Transactional Producer with sendOffsetsToTransaction. Leaving enable.auto.commit=true is dangerous because offsets may advance even when the output fails, so avoid it in the EOS context.

  • Explicitly set isolation.level=read_committed
  • Set enable.auto.commit=false and submit offsets through the Producer transaction
  • Extending EOS to external sinks requires the downstream system's transactions or an idempotency-key design

Java: Basic Consumer configuration (read_committed)

Properties c = new Properties();
c.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker:9092");
c.put(ConsumerConfig.GROUP_ID_CONFIG, "cg-transform");
c.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
c.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringDeserializer");
c.put(ConsumerConfig.ISOLATION_LEVEL_CONFIG, "read_committed");
c.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(c);

EOS in Kafka Streams: processing.guarantee=exactly_once_v2

Kafka Streams backs application state stores via changelog topics and internally combines a Transactional Producer with a read_committed Consumer to deliver EOS. In production, the recommended setting is processing.guarantee=exactly_once_v2, which offers a better latency-availability balance than the legacy exactly_once.

Set the replication.factor and cleanup.policy=compact correctly on internal topics (state changelog and repartition), and understand fencing behavior on app restarts and rebalances — that makes failure-time behavior much easier to reason about. Transactions are scoped per sink in the topology, and any aborted records are hidden by the consumer-side read_committed setting.

  • Explicitly set processing.guarantee=exactly_once_v2
  • Set replication.factor and min.insync.replicas on internal topics appropriately
  • Use a stable application.id so state is restored correctly
  • Design record keys (correct partitioning) to preserve ordering guarantees

Java: Streams EOS configuration and a minimal topology

Properties s = new Properties();
s.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "broker:9092");
s.put(StreamsConfig.APPLICATION_ID_CONFIG, "payments-app");
s.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE_V2);

StreamsBuilder b = new StreamsBuilder();
KStream<String, String> in = b.stream("in");
KStream<String, String> out = in.mapValues(v -> enrich(v));
out.to("out");

KafkaStreams streams = new KafkaStreams(b.build(), s);
streams.start();

Delivery-Guarantee Comparison and What CCDAK Tends to Ask

Delivery guarantees fall into three categories: at-most-once, at-least-once, and exactly-once. CCDAK frequently asks about combinations of settings and the trade-offs of their side effects (duplicates, latency, visibility).

Be sure to remember that idempotence alone does not produce EOS, that a Consumer without read_committed will read records from aborted transactions, and that Streams requires exactly_once_v2 plus durable internal-topic settings.

  • Idempotence prevents duplicate writes (within the Producer); transactions are required to make offsets atomic
  • read_committed hides records from aborted transactions
  • Streams EOS depends on correct state-store and internal-topic design
GuaranteeProducer requirementsConsumer requirementsDuplicates and visibility
At-most-onceacks=0|1, no idempotence/transactions, commit offsets earlyNo special requirementsGenerally no duplicates, but records can be lost
At-least-onceacks=all recommended, idempotence optional, retries enabledApplication handles deduplicationDuplicates are expected (re-processing assumed); losses are largely avoided
Exactly-onceenable.idempotence=true, transactional.id set, acks=all, sendOffsetsToTransactionisolation.level=read_committed, auto-commit disabledAborts are invisible; deduplication is guaranteed inside Kafka

Operations & Troubleshooting: Fencing, Timeouts, Metrics

If two Transactional Producer instances share the same transactional.id, one will be fenced with ProducerFencedException and stop. This is the correct anti-zombie behavior. During rolling deploys, stay aware of instance counts and transaction boundaries.

When aborts grow because of transaction.timeout.ms expiries or broker-side issues, wait times and re-processing both increase. Observe topic min.insync.replicas, the network, storage IO, and GC together. Key metrics include transaction commit/abort rates, producer retries and batch sizes, and consumer-side latency gauges.

  • First response by exception: ProducerFenced → stop the process and let the scheduler reassign; OutOfOrderSequence → abort and re-initialize
  • Align with broker config: transaction.max.timeout.ms and producer.transaction.timeout.ms
  • Internal topics (Streams): monitor replication.factor and cleanup policy
  • Keep ordering on retries: enforce max.in.flight.requests.per.connection<=5

Java: Template for safe abort and re-initialization on exceptions

try {
  producer.beginTransaction();
  // ... send
  producer.commitTransaction();
} catch (ProducerFencedException e) {
  // 同一transactional.idの別インスタンスにフェンスされた
  closeQuietly(producer);
  System.exit(1); // 再スケジューリングへ
} catch (KafkaException e) {
  // それ以外はabortして再試行判断
  safeAbort(producer);
  // backoff + 再初期化
  producer.initTransactions();
}

Check Your Understanding

CCDAK

問題 1

For a Kafka application that reads an input topic, transforms records, and writes them back to an output topic, which is the minimum configuration that establishes Exactly-Once inside Kafka?

  1. An idempotent producer (enable.idempotence=true) alone; the Consumer can keep its default settings.
  2. A transactional Producer plus a Consumer with auto-commit enabled; read_uncommitted is fine.
  3. A transactional Producer using sendOffsetsToTransaction, with the Consumer set to read_committed and auto-commit disabled.
  4. Skip Kafka Streams, keep both Producer and Consumer at default settings, and deduplicate in application code.

正解: C

To establish Exactly-Once inside Kafka, a Transactional Producer must include output and offsets in the same transaction (sendOffsetsToTransaction), the Consumer must use read_committed to hide aborted records, and auto-commit must be disabled. Idempotence alone, or auto-commit enabled, cannot satisfy EOS.

Frequently Asked Questions

What is the difference between enable.idempotence and transactional.id?

enable.idempotence=true prevents duplicate writes caused by retries within the same partition. Setting transactional.id and using transactions lets you atomically bundle produced records together with consumer offset commits, which delivers EOS for the Consume-Transform-Produce pattern. Idempotence alone cannot give you atomicity with offsets.

What value should max.in.flight.requests.per.connection be set to?

For EOS, the requirement is at most 5 when idempotence is enabled, to avoid retry-induced reordering. Do not rely on client defaults or auto-tuning — remember 5 or lower as the recommended value for the exam as well.

What is the difference between exactly_once and exactly_once_v2 in Kafka Streams?

exactly_once_v2 has an improved internal implementation that delivers better latency and rebalance behavior than the original exactly_once. exactly_once_v2 is now the recommended setting. Both provide EOS, but on the exam the key points are choosing v2 and understanding internal topics and state management.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
Kafka

Kafka Topics & Partitions: Distribution Fundamentals (2026)

How Kafka topics and partitions enable scale — ordering guar...

Kafka

CCDAK Exam Guide: Confluent Certified Developer (2026)

Complete prep for the CCDAK exam — Producer/Consumer API, St...

Kafka

CCAAK Exam Guide: Confluent Certified Administrator (2026)

Pass the CCAAK exam — cluster management, partitions, securi...

Kafka

Kafka Replicas & ISR: Fault Tolerance Explained (2026)

Replica placement, in-sync replicas (ISR), leader election. ...

Kafka

Kafka Offsets: Commit Modes & Consumer Position (2026)

Offset semantics — auto vs. manual commit, __consumer_offset...

Browse all Kafka articles (101)
© 2026 NicheeLab All rights reserved.