The standard approach to Producer tuning is to clarify your goals first, lock down the reliability constraints, and only then push for higher throughput. In particular, acks, enable.idempotence, retries, and max.in.flight.requests.per.connection govern whether messages can be lost or duplicated, so decide those first.
This article assumes the stable concepts grounded in the official Apache Kafka documentation and Confluent's behavioral specifications, and walks through each setting from a practitioner's perspective aligned with CCDAK exam coverage.
The Producer increases throughput by grouping records destined for the same partition into a single send, reducing network round trips. The key knobs are batch.size (the max bytes per batch) and linger.ms (the max time to wait while filling a batch). Raising batch.size and setting linger.ms to a few to a few tens of milliseconds trades a little latency for a substantial throughput gain.
buffer.memory is the total in-flight buffer capacity and acts as a safety valve when production rate temporarily exceeds broker processing capacity. If you see BufferExhausted frequently, expand buffer.memory alongside revisiting batch.size and linger.ms.
Recommended Java Producer snippet (throughput-oriented batching example)
Properties p = new Properties();
p.put("bootstrap.servers", "broker1:9092,broker2:9092");
p.put("key.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");
p.put("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");
// バッチング
p.put("batch.size", 131072); // 128KB
p.put("linger.ms", 20); // 20ms 待ってバッチを埋める
p.put("buffer.memory", 67108864); // 64MB
// 圧縮は次セクション参照
KafkaProducer<byte[], byte[]> producer = new KafkaProducer<>(p);compression.type accepts values such as none, gzip, snappy, lz4, and zstd. In general, lz4 and zstd compress and decompress quickly, making them well suited for combining network bandwidth savings with high throughput. gzip yields a higher compression ratio but at higher CPU cost. Pick based on which algorithms your cluster supports.
Even when average records are small, batch-level compression still saves bandwidth overall. For very large records, also account for max.request.size, the broker-side receive limit, and the overhead of record headers and schemas.
Java Producer compression configuration example
Properties p = new Properties();
// ... 基本設定は省略
p.put("compression.type", "lz4"); // もしくは "zstd"
// zstd 利用時はクラスターの対応状況を事前確認
// 必要に応じてレコードヘッダやシリアライザの最適化も検討Reliability is defined by the acks setting. acks=all waits for replication across the leader and the full ISR before acknowledging, minimizing the risk of data loss during broker failure. Even when chasing high throughput, start your acks decision from all.
Enabling retries lets you ride out transient network issues and leader changes, but introduces the possibility of reordering and duplicates. Combining it with enable.idempotence=true adds Producer-level deduplication, preventing duplicates under at-least-once delivery. With idempotence on, it is important not to set max.in.flight.requests.per.connection too high; 5 or below is the usual recommendation.
| Setting | Throughput impact | Reliability / ordering | Caveats |
|---|---|---|---|
| acks=0/1 | Light round trip, fast | Data loss risk high / medium | Not recommended outside testing or transient logs |
| acks=all | Slight drop | Robust: confirms ISR commit | Run paired with the broker's min.insync.replicas |
| enable.idempotence=true | Slight overhead | Avoids duplicates, improves ordering stability | max.in.flight should stay at 5 or below |
| Large retries | Higher latency under congestion | Robust against transient failures | Retries are bounded by delivery.timeout.ms |
| max.in.flight=1/5/10 | Faster as the value grows: small / medium / large | Ordering: stable / acceptable / prone to reordering | Without idempotence, high values risk duplicates and reordering |
acks=all and replication to the ISR
Example configuration combining reliability and throughput
Properties p = new Properties();
// ... 基本設定は省略
p.put("acks", "all");
p.put("enable.idempotence", true);
p.put("retries", Integer.toString(Integer.MAX_VALUE));
p.put("max.in.flight.requests.per.connection", 5); // idempotence と両立
p.put("linger.ms", 20);
p.put("batch.size", 131072);
p.put("compression.type", "lz4");delivery.timeout.ms sets the upper bound for completing a single record send, success or failure. It caps the total time including retries; once exceeded, the send returns failure to the callback. If you raise retries in pursuit of throughput, also extend delivery.timeout.ms accordingly.
request.timeout.ms is the response wait time for a single request. Too short and you get false-positive retries; too long and failure detection lags. Base it on network RTT and broker load, leaving a margin above the observed p95-p99 latency for stability. Names vary across language clients; non-Java implementations such as librdkafka use message.timeout.ms for related semantics, so check before tuning.
Example timeout and backoff configuration
Properties p = new Properties();
// ... 基本設定は省略
p.put("delivery.timeout.ms", 180000); // 約 3 分
p.put("request.timeout.ms", 30000); // 30 秒程度を起点に調整
p.put("retry.backoff.ms", 100); // ブローカー側負荷を考慮
p.put("reconnect.backoff.ms", 50);
p.put("reconnect.backoff.max.ms", 1000);Partition count sets the Producer's parallelism and throughput ceiling. On the Producer side, sends are serialized per partition, so spreading load across more partitions increases the total throughput for a topic. However, too many partitions raises the broker's metadata and file descriptor costs.
Design record keys around the unit where ordering must be preserved. Records sharing a key map to the same partition and retain order. Hot keys create skew that caps throughput, so diversify key encoding or hashing to soften the imbalance.
Custom partitioner skeleton (Java)
public class SaltedKeyPartitioner implements Partitioner {
@Override
public void configure(Map<String, ?> configs) {}
@Override
public int partition(String topic, Object key, byte[] keyBytes,
Object value, byte[] valueBytes, Cluster cluster) {
int partitions = cluster.partitionCountForTopic(topic);
// キーのホットスポット緩和のため簡易ソルト
int salt = ThreadLocalRandom.current().nextInt(4); // 0..3
return Utils.toPositive(Utils.murmur2(ByteBuffer.allocate(keyBytes.length+1)
.put(keyBytes).put((byte)salt).array())) % partitions;
}
@Override
public void close() {}
}Observe Producer metrics and quantify the bottleneck before moving any settings. Useful ones include record-error-rate, record-retry-rate, request-latency-avg, batch-size-avg, records-per-request-avg, compression-rate-avg, bufferpool-wait-ratio, and outgoing-byte-rate.
The safe sequence is to lock reliability first, then tune batching and compression, and finally finish with timeouts and backoff to stabilize behavior. Change one knob at a time, allow a sufficient observation window, and verify there is no regression in p95/p99.
Example JMX metric names (for filtering)
# 例: kafka.producer:type=producer-metrics の一部
kafka.producer:type=producer-metrics,client-id=*,metric=record-error-rate
kafka.producer:type=producer-metrics,client-id=*,metric=record-retry-rate
kafka.producer:type=producer-metrics,client-id=*,metric=request-latency-avg
kafka.producer:type=producer-node-metrics,client-id=*,node-id=*,metric=outgoing-byte-rateCCDAK
問題 1
You want high throughput, at-least-once delivery, and to avoid duplicates as much as possible. Which combination of settings is most appropriate?
正解: B
B satisfies at-least-once delivery through acks=all and retries, and enable.idempotence suppresses duplicates on retry. max.in.flight=5 is the sweet spot between throughput and ordering stability, while moderate linger.ms and batch.size raise batch efficiency. A is safe but needlessly sacrifices throughput with max.in.flight=1. C does not meet the reliability requirement. D leaves duplicate risk on the table.
What is the difference between enable.idempotence and transactions?
enable.idempotence performs producer-level deduplication within a single partition, preventing duplicate writes on retry. Transactions provide atomic writes across multiple partitions and topics, plus consistency for EOS (read-process-write) consumption. If your only goal is preventing duplicates, idempotence alone is often enough.
Can data loss still occur with acks=all?
Loss risk remains if the broker-side min.insync.replicas is too low, or if the ISR shrinks to a single replica and consecutive failures hit. Loss can also happen end-to-end if the application crashes after a successful send but before downstream persistence completes.
I cannot find delivery.timeout.ms in the Python or C/C++ clients.
In the Java Producer, delivery.timeout.ms is the overall upper bound for a send including retries. In librdkafka-based clients, message.timeout.ms plays the equivalent role. Check your client's official setting name and default value to ensure the semantics line up.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...