Kafka Producer Tuning: Throughput & Latency (2026)

The standard approach to Producer tuning is to clarify your goals first, lock down the reliability constraints, and only then push for higher throughput. In particular, acks, enable.idempotence, retries, and max.in.flight.requests.per.connection govern whether messages can be lost or duplicated, so decide those first.

This article assumes the stable concepts grounded in the official Apache Kafka documentation and Confluent's behavioral specifications, and walks through each setting from a practitioner's perspective aligned with CCDAK exam coverage.

Batching and Send Timing: batch.size and linger.ms

The Producer increases throughput by grouping records destined for the same partition into a single send, reducing network round trips. The key knobs are batch.size (the max bytes per batch) and linger.ms (the max time to wait while filling a batch). Raising batch.size and setting linger.ms to a few to a few tens of milliseconds trades a little latency for a substantial throughput gain.

buffer.memory is the total in-flight buffer capacity and acts as a safety valve when production rate temporarily exceeds broker processing capacity. If you see BufferExhausted frequently, expand buffer.memory alongside revisiting batch.size and linger.ms.

Latency-first: linger.ms 0-5ms, batch.size from default up to around 64KB
Throughput-first: linger.ms 10-50ms, batch.size enlarged to roughly 64-256KB
For very large records, also align max.request.size with the broker's message.max.bytes

Recommended Java Producer snippet (throughput-oriented batching example)

Properties p = new Properties();
p.put("bootstrap.servers", "broker1:9092,broker2:9092");
p.put("key.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");
p.put("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");
// バッチング
p.put("batch.size", 131072);           // 128KB
p.put("linger.ms", 20);                // 20ms 待ってバッチを埋める
p.put("buffer.memory", 67108864);      // 64MB
// 圧縮は次セクション参照
KafkaProducer<byte[], byte[]> producer = new KafkaProducer<>(p);

Compression and Record Size: Balancing CPU and Network

compression.type accepts values such as none, gzip, snappy, lz4, and zstd. In general, lz4 and zstd compress and decompress quickly, making them well suited for combining network bandwidth savings with high throughput. gzip yields a higher compression ratio but at higher CPU cost. Pick based on which algorithms your cluster supports.

Even when average records are small, batch-level compression still saves bandwidth overall. For very large records, also account for max.request.size, the broker-side receive limit, and the overhead of record headers and schemas.

Overall, lz4 or zstd are the first picks for high throughput
Compression works per batch, so optimize it together with batch.size
When CPU-bound, revisit linger.ms and batch.size before falling back to none

Java Producer compression configuration example

Properties p = new Properties();
// ... 基本設定は省略
p.put("compression.type", "lz4");        // もしくは "zstd"
// zstd 利用時はクラスターの対応状況を事前確認
// 必要に応じてレコードヘッダやシリアライザの最適化も検討

Reliability Essentials: acks, retries, enable.idempotence, max.in.flight

Reliability is defined by the acks setting. acks=all waits for replication across the leader and the full ISR before acknowledging, minimizing the risk of data loss during broker failure. Even when chasing high throughput, start your acks decision from all.

Enabling retries lets you ride out transient network issues and leader changes, but introduces the possibility of reordering and duplicates. Combining it with enable.idempotence=true adds Producer-level deduplication, preventing duplicates under at-least-once delivery. With idempotence on, it is important not to set max.in.flight.requests.per.connection too high; 5 or below is the usual recommendation.

At-least-once delivery with high throughput: acks=all, enable.idempotence=true, large retries, max.in.flight around 5
If strict ordering is paramount, max.in.flight=1 works but throughput drops significantly
Pair acks=all with the broker-side min.insync.replicas setting in production

Setting	Throughput impact	Reliability / ordering	Caveats
acks=0/1	Light round trip, fast	Data loss risk high / medium	Not recommended outside testing or transient logs
acks=all	Slight drop	Robust: confirms ISR commit	Run paired with the broker's min.insync.replicas
enable.idempotence=true	Slight overhead	Avoids duplicates, improves ordering stability	max.in.flight should stay at 5 or below
Large retries	Higher latency under congestion	Robust against transient failures	Retries are bounded by delivery.timeout.ms
max.in.flight=1/5/10	Faster as the value grows: small / medium / large	Ordering: stable / acceptable / prone to reordering	Without idempotence, high values risk duplicates and reordering

acks=all and replication to the ISR

Example configuration combining reliability and throughput

Properties p = new Properties();
// ... 基本設定は省略
p.put("acks", "all");
p.put("enable.idempotence", true);
p.put("retries", Integer.toString(Integer.MAX_VALUE));
p.put("max.in.flight.requests.per.connection", 5); // idempotence と両立
p.put("linger.ms", 20);
p.put("batch.size", 131072);
p.put("compression.type", "lz4");

Timeout Design: delivery.timeout.ms and request.timeout.ms

delivery.timeout.ms sets the upper bound for completing a single record send, success or failure. It caps the total time including retries; once exceeded, the send returns failure to the callback. If you raise retries in pursuit of throughput, also extend delivery.timeout.ms accordingly.

request.timeout.ms is the response wait time for a single request. Too short and you get false-positive retries; too long and failure detection lags. Base it on network RTT and broker load, leaving a margin above the observed p95-p99 latency for stability. Names vary across language clients; non-Java implementations such as librdkafka use message.timeout.ms for related semantics, so check before tuning.

If you raise retries, raise delivery.timeout.ms with it
Tune request.timeout.ms based on measured network and broker behavior
Adjust backoff settings together to prevent retry storms

Example timeout and backoff configuration

Properties p = new Properties();
// ... 基本設定は省略
p.put("delivery.timeout.ms", 180000);   // 約 3 分
p.put("request.timeout.ms", 30000);     // 30 秒程度を起点に調整
p.put("retry.backoff.ms", 100);         // ブローカー側負荷を考慮
p.put("reconnect.backoff.ms", 50);
p.put("reconnect.backoff.max.ms", 1000);

Partition Design and Key Selection

Partition count sets the Producer's parallelism and throughput ceiling. On the Producer side, sends are serialized per partition, so spreading load across more partitions increases the total throughput for a topic. However, too many partitions raises the broker's metadata and file descriptor costs.

Design record keys around the unit where ordering must be preserved. Records sharing a key map to the same partition and retain order. Hot keys create skew that caps throughput, so diversify key encoding or hashing to soften the imbalance.

Define keys around the unit needing ordering, and randomize when ordering is not required to spread load
Size partition count alongside consumer group size and broker resources
Detect hot keys by watching latency skew and per-partition throughput

Custom partitioner skeleton (Java)

public class SaltedKeyPartitioner implements Partitioner {
  @Override
  public void configure(Map<String, ?> configs) {}
  @Override
  public int partition(String topic, Object key, byte[] keyBytes,
                       Object value, byte[] valueBytes, Cluster cluster) {
    int partitions = cluster.partitionCountForTopic(topic);
    // キーのホットスポット緩和のため簡易ソルト
    int salt = ThreadLocalRandom.current().nextInt(4); // 0..3
    return Utils.toPositive(Utils.murmur2(ByteBuffer.allocate(keyBytes.length+1)
                      .put(keyBytes).put((byte)salt).array())) % partitions;
  }
  @Override
  public void close() {}
}

Metrics Monitoring and a Stepwise Tuning Procedure

Observe Producer metrics and quantify the bottleneck before moving any settings. Useful ones include record-error-rate, record-retry-rate, request-latency-avg, batch-size-avg, records-per-request-avg, compression-rate-avg, bufferpool-wait-ratio, and outgoing-byte-rate.

The safe sequence is to lock reliability first, then tune batching and compression, and finally finish with timeouts and backoff to stabilize behavior. Change one knob at a time, allow a sufficient observation window, and verify there is no regression in p95/p99.

Decide acks and idempotence first, then optimize the rest
Tune batching and compression to balance CPU against bandwidth
Use timeouts and backoff to keep behavior stable during spikes
Save metric baselines so rollback stays easy

Example JMX metric names (for filtering)

# 例: kafka.producer:type=producer-metrics の一部
kafka.producer:type=producer-metrics,client-id=*,metric=record-error-rate
kafka.producer:type=producer-metrics,client-id=*,metric=record-retry-rate
kafka.producer:type=producer-metrics,client-id=*,metric=request-latency-avg
kafka.producer:type=producer-node-metrics,client-id=*,node-id=*,metric=outgoing-byte-rate

Check Your Understanding

CCDAK

問題 1

You want high throughput, at-least-once delivery, and to avoid duplicates as much as possible. Which combination of settings is most appropriate?

acks=all, enable.idempotence=true, retries=MAX, max.in.flight=1, linger.ms=50, batch.size=128KB
acks=all, enable.idempotence=true, retries=MAX, max.in.flight=5, linger.ms=20, batch.size=64KB
acks=1, retries=0, max.in.flight=10, linger.ms=0, batch.size=16KB
acks=all, enable.idempotence=false, retries=5, max.in.flight=10, linger.ms=10, batch.size=32KB

正解: B

B satisfies at-least-once delivery through acks=all and retries, and enable.idempotence suppresses duplicates on retry. max.in.flight=5 is the sweet spot between throughput and ordering stability, while moderate linger.ms and batch.size raise batch efficiency. A is safe but needlessly sacrifices throughput with max.in.flight=1. C does not meet the reliability requirement. D leaves duplicate risk on the table.

Frequently Asked Questions

What is the difference between enable.idempotence and transactions?

enable.idempotence performs producer-level deduplication within a single partition, preventing duplicate writes on retry. Transactions provide atomic writes across multiple partitions and topics, plus consistency for EOS (read-process-write) consumption. If your only goal is preventing duplicates, idempotence alone is often enough.

Can data loss still occur with acks=all?

Loss risk remains if the broker-side min.insync.replicas is too low, or if the ISR shrinks to a single replica and consecutive failures hit. Loss can also happen end-to-end if the application crashes after a successful send but before downstream persistence completes.

I cannot find delivery.timeout.ms in the Python or C/C++ clients.

In the Java Producer, delivery.timeout.ms is the overall upper bound for a send including retries. In librdkafka-based clients, message.timeout.ms plays the equivalent role. Check your client's official setting name and default value to ensure the semantics line up.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Kafka Producer Tuning: Balancing Throughput and Reliability