Kafka

Kafka Idempotent Producer Deep Dive: enable.idempotence, Duplicate Prevention, and CCDAK Prep

2026-04-19
NicheeLab Editorial Team

Kafka retries improve availability, but left as-is, broker retries can cause duplicate writes. Idempotent Producer is the canonical feature that achieves "no duplicates within the same partition" purely on the producer side.

This article covers the internals of enable.idempotence, required/recommended settings, failure scenarios, operations, and exam prep. Wording is kept careful where behavior varies by version. For specifics, follow the official Kafka and Confluent documentation.

Why You Need enable.idempotence

When a producer detects a send failure due to a network outage or leader change, it retries. With plain retries alone, the same record can be written twice to the same partition. Filtering this out at the application layer is costly, and in queuing or payment systems it can be fatal.

Idempotent Producer prevents duplicate acceptance at the broker using a producer ID and sequence number. This keeps inserts within a partition unique even when retries occur. CCDAK frequently tests the "required combination of settings" and "scope of guarantees".

  • Scope is within a single partition. Uniqueness across a whole topic or multiple partitions is out of scope.
  • enable.idempotence is a producer-side guarantee. Consumer-side duplicates are a separate problem (handled via transactions or at the application layer).
  • Remember that acks=all, retries>0, and max.in.flight.requests.per.connection are part of the required constraints.

How It Works: Deduplication via ProducerId and Sequence Numbers

Idempotent Producer identifies each record batch with a ProducerId (PID) assigned by the broker and a monotonically increasing per-partition sequence number. The broker (leader) retains received PID and sequence numbers, discards known combinations as duplicates, and only accepts batches in the correct order.

Because acks=all waits for replication across the full ISR before acknowledging success, it suppresses double visibility of the same record even across network outages or leader changes. The producer auto-recovers from errors like OutOfOrderSequence or UnknownProducerId, bumping the epoch and re-initializing when needed.

  • ProducerId: an identifier assigned during the producer's startup handshake.
  • Sequence number: an incrementing per-partition number. Retries reuse the same number.
  • Deduplication state is maintained on the broker leader. Re-initialization may occur after log truncation or a leader switch following a long outage.

Idempotent Producer deduplication flow (conceptual)

PID=123, seq=10 (seen?)Producer(App)PID=123, seq(p0)=10, seq(p1)=27Broker Leader p0dedupeLog(p0)PID=123, seq=10。new -> append & ack(all) / dup -> drop & ack(success)

Key Settings and Recommended Values

To enable Idempotent Producer, enable.idempotence=true is the core setting. To stay consistent, you also need acks=all, retries>0 (effectively a large value in most implementations), and max.in.flight.requests.per.connection within the constraint (generally 5 or less). Depending on the client, conflicting settings may raise an error or be auto-overridden/limited. The exam often tests these dependency relationships.

delivery.timeout.ms caps the total time across all retries. Once exceeded, the send is treated as a failure, but duplicates from intermediate retries are still prevented by Idempotent Producer. For throughput optimization, combine linger.ms and batch.size, but balance against your latency budget. Managed environments such as Confluent Cloud typically assume acks=all and support Idempotent Producer by default.

Note: defaults and auto-adjustment behavior can vary by client and version. Always check the latest official documentation.

  • enable.idempotence=true (required)
  • acks=all (effectively required; conflicting settings are rejected or forcibly adjusted)
  • retries greater than 0 (large in practice), controlled together with delivery.timeout.ms
  • max.in.flight.requests.per.connection recommended at 5 or less (preserves ordering and consistency)
  • Tune throughput with linger.ms and batch.size, adjusting for your latency requirements
  • Watch for consistency between request.timeout.ms and retry.backoff.ms
Delivery Guarantee ModeKey SettingsDuplicatesOrdering
At-most-onceacks=0 or acks=1, retries=0None, but high risk of lossWeak guarantees
At-least-onceacks>=1, retries>0, enable.idempotence=falsePossible (retries can duplicate)Can be tightened with max.in.flight=1
Idempotentenable.idempotence=true, acks=all, retries>0, max.in.flight<=5Prevented within the same partitionStrong (per partition)
Transactional EOSIdempotent plus transactional.id, with proper consumer isolationPrevented end-to-end (can span multiple partitions)Strong (transaction boundary)

Minimal Java Producer example (idempotence + throughput-aware)

Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, StringSerializer.class.getName());

// Core of idempotence
props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.RETRIES_CONFIG, Integer.toString(Integer.MAX_VALUE));
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "5"); // ordering and consistency

// throughput and stability
props.put(ProducerConfig.LINGER_MS_CONFIG, "20");
props.put(ProducerConfig.BATCH_SIZE_CONFIG, "65536");
props.put(ProducerConfig.DELIVERY_TIMEOUT_MS_CONFIG, "120000");

KafkaProducer<String, String> producer = new KafkaProducer<>(props);
ProducerRecord<String, String> rec = new ProducerRecord<>("orders", "order-123", "payload");
producer.send(rec, (md, ex) -> {
  if (ex != null) {
    // Supplementary logging on failure; duplicates are prevented by idempotence
    ex.printStackTrace();
  }
});
producer.flush();
producer.close();

Failure Scenarios and Error Handling

Network outages, leader changes, and log truncation can cause state between producer and broker to drift. Idempotent Producer attempts recovery via epoch management and sequence control, but application-side retry design and monitoring are still important.

Typical errors and their behavior are listed below. The exact trigger conditions depend on broker and client versions and configuration, so test against your own operating environment.

  • OutOfOrderSequenceException: retries arrived out of order. Keep max.in.flight at 5 or less and review concurrent send patterns in your application.
  • UnknownProducerId: broker-side state was lost, or after log truncation. The producer bumps the epoch and re-initializes to continue.
  • NotEnoughReplicas / Timeout: insufficient ISR under acks=all. Failure is finalized once delivery.timeout.ms is exceeded. Duplicates are still prevented, but the application must handle failure.
  • ProducerFenced (transactional case): multiple producers using the same transactional.id. Single-active control kicks in.

Operations and Monitoring: Metrics, Tuning, Cloud

Whether idempotence is on or off is clear from configuration, but in production you must continuously monitor retry rate, throttling, and latency trends. Dashboard producer metrics (record-error-rate, record-retry-rate, request-latency-avg, produce-throttle-time-avg, batch-size-avg, records-per-request-avg, and so on) and review them alongside the count of delivery.timeout.ms errors.

The baseline for throughput optimization is to strengthen batching with linger.ms and batch.size while securing parallelism under the max.in.flight<=5 constraint. Key design matters: routing the same key to the same partition stably also makes application-level consistency easier to maintain.

In managed environments (such as Confluent Cloud), acks=all is assumed and recommended, and Idempotent Producer is available as standard. Client-side defaults and constraints can be updated over time, so check the docs for your specific environment when adopting it.

  • First monitoring steps: record-error-rate, record-retry-rate, request-latency-avg
  • Balancing retries and latency: trade off linger.ms against delivery.timeout.ms
  • Stabilize partition keys: route the same business key to the same partition
  • Detect backpressure: watch for produce-throttle-time-avg rising

CCDAK Prep: Key Points and Pitfalls

The exam tends to focus on Idempotent Producer's guarantee scope, the required combination of settings, and how it differs from transactions. Be clear that simply raising retries does not prevent duplicates, understand what acks=all means, and know the max.in.flight constraint.

Also remember that Idempotent Producer is "write-side duplicate prevention" and is separate from consumer-side duplicate reads. For end-to-end Exactly-Once, be ready to distinguish using transactions or Kafka Streams EOS mode in your answers.

  • enable.idempotence=true prevents duplicates within a partition; across partitions is out of scope.
  • Required combination: enable.idempotence=true, acks=all, retries>0, max.in.flight<=5.
  • Consumer duplicates are separate; combine read_committed or transactions as needed.
  • max.in.flight=1 prioritizes ordering but reduces throughput. Under idempotence, allowing up to 5 is typical.
  • Treat exceeding delivery.timeout.ms as failure. Duplicates are still prevented, but a recovery strategy is still required.

Check Your Understanding

CCDAK

問題 1

You want to prevent retry duplicates within the same partition on a Kafka producer. Which combination of settings meets the requirement?

  1. enable.idempotence=true, acks=all, max.in.flight.requests.per.connection<=5
  2. enable.idempotence=false, acks=1, retries=Integer.MAX_VALUE
  3. enable.idempotence=true and acks=1
  4. Only set transactional.id (enable.idempotence=false)

正解: A

Idempotent Producer requires enable.idempotence=true and must align with acks=all. For ordering and consistency, max.in.flight.requests.per.connection must stay within the constraint (generally 5 or less). B has no idempotence, so duplicates can occur. C conflicts because of acks=1. D is also wrong because transactions are built on idempotence internally, and transactional.id alone does not satisfy the requirement.

Frequently Asked Questions

Is enable.idempotence enabled by default?

In many Kafka clients the default is false. If duplicate prevention is a requirement, set it to true explicitly and align related settings such as acks=all. Defaults can vary across clients and versions, so check the official documentation when adopting it.

Does Idempotent Producer also prevent duplicate reads on the consumer side?

No. Idempotent Producer prevents duplicate acceptance on the write side. Consumer-side duplicates must be handled via offset commit strategies or transactions (read_committed). For end-to-end Exactly-Once, consider transactions or Kafka Streams EOS.

Should max.in.flight.requests.per.connection be pinned to 1?

Pinning to 1 is sometimes chosen to enforce strict ordering under At-least-once without idempotence, but with Idempotent Producer you can preserve ordering and consistency at up to 5. Since 1 sacrifices significant throughput, tune between 1 and 5 based on requirements.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
Kafka

Kafka Topics & Partitions: Distribution Fundamentals (2026)

How Kafka topics and partitions enable scale — ordering guar...

Kafka

CCDAK Exam Guide: Confluent Certified Developer (2026)

Complete prep for the CCDAK exam — Producer/Consumer API, St...

Kafka

CCAAK Exam Guide: Confluent Certified Administrator (2026)

Pass the CCAAK exam — cluster management, partitions, securi...

Kafka

Kafka Replicas & ISR: Fault Tolerance Explained (2026)

Replica placement, in-sync replicas (ISR), leader election. ...

Kafka

Kafka Offsets: Commit Modes & Consumer Position (2026)

Offset semantics — auto vs. manual commit, __consumer_offset...

Browse all Kafka articles (101)
© 2026 NicheeLab All rights reserved.