Producer Batching & Linger: Throughput Tuning (2026)

The Kafka Producer batches records per partition before sending. batch.size is the maximum byte size of a single batch, and linger.ms is the upper bound on how long it waits for additional records when a batch is not yet full.

This article gives concrete guidance for balancing throughput with a latency SLA, grounded in the stable behavior documented officially. It also covers CCDAK-focused key points and the pitfalls operators run into most often.

batch.size and linger.ms: Basics and When to Use Each

batch.size is the maximum size (in bytes) of a single send batch per partition. The actual batch size is determined by how much data has accumulated recently, with batch.size acting as the ceiling. Raise it when you want to improve network efficiency for many small records or boost compression efficiency.

linger.ms is the maximum time (in milliseconds) the producer waits for more records when a batch is not full. Once the batch fills up, it is sent immediately without waiting. This lets you bundle records even during low-traffic periods, at the cost of added wait latency. The default is generally 0 ms, which sends as soon as possible with no extra waiting.

An important premise: batches are formed per partition. When you write evenly across many partitions, multiple pending batches can exist at once, so the batch.size setting also affects buffer.memory consumption.

Low-latency focus: start with linger.ms at 0-2 ms and batch.size around 16-32 KB
Balanced profile: linger.ms 5-15 ms and batch.size roughly 32-64 KB
High-throughput focus: consider linger.ms 20-50 ms and batch.size 64-128 KB or larger (watch SLA and memory)
In every case, setting compression.type to lz4 or zstd improves compression efficiency with larger batches, which lifts throughput

Profile	batch.size (target)	linger.ms (target)	Characteristics
Low latency	16-32 KB	0-2 ms	Minimal send wait. Frequent small batches mean somewhat lower network efficiency
Balanced	32-64 KB	5-15 ms	A general-purpose setting with a good balance of compression efficiency and latency
High throughput	64-128 KB+	20-50 ms	Larger batches maximize network and compression efficiency. P99 latency grows

Producer Batching Internals and the Send Path

Inside the producer thread, records accumulate in the RecordAccumulator, forming a batch per partition. A send is triggered under two conditions: when the batch reaches batch.size, and when linger.ms is hit. Whichever happens first becomes the trigger.

The Sender thread takes accumulated batches and sends them over the network in bulk. Settings such as acks, retries, and max.in.flight.requests.per.connection influence throughput and retry behavior (including ordering guarantees). Compression runs per batch, so giving the producer reasonable batch size and linger.ms lifts compression efficiency and reduces bandwidth use.

Recent Kafka versions have improved partitioning strategies to boost batch efficiency, but the essence remains: per-partition accumulation and a dual size/time trigger for sending. This mechanism itself is stable and not version dependent.

Send trigger: hitting the size cap or hitting linger.ms (whichever comes first)
Compression is per batch: larger batches tend to give a better compression ratio
Memory is pooled from buffer.memory, and consumption rises as the number of concurrent pending batches grows

Conceptual diagram of producer batch formation and sending

Latency Budget: Designing Backward from the SLA

Work backward from your end-to-end latency SLA (for example, P99 ≤ 200 ms) to decide how much send wait time the producer can afford. linger.ms can add directly to wait latency in the worst case, so set the upper bound after subtracting the average and tail latencies for the network and broker response (acks).

For example, if the broker round-trip P99 (acks=all) is 80 ms and application processing is 50 ms, the producer can tolerate roughly 70 ms of wait. However, when the inter-arrival time between records is sparse, sends will frequently fire only after linger.ms expires, pushing up the average latency. With a tight SLA, keep linger.ms small; when the arrival rate is high, the size trigger fires first, so the actual latency impact of a slightly larger linger.ms is limited.

SLA backsolve: budget = target percentile latency − (app processing + broker round-trip + downstream processing)
High arrival rate: size trigger dominates → linger.ms has little impact
Low arrival rate: time trigger dominates → linger.ms pushes up average latency
delivery.timeout.ms is the overall delivery cap. Raising linger.ms cannot exceed it

Side Effects and Safeguards: Memory, Ordering, Retries

A large batch.size increases buffer.memory consumption when many partitions have pending batches simultaneously. As the send backlog grows, produce() blocks up to max.block.ms, which hurts application throughput. Raise it gradually while observing the metrics, and if necessary, revisit your partition count and send parallelism design.

Retries and a high max.in.flight.requests.per.connection boost throughput, but how ordering and duplicates are handled depends on configuration. As a baseline in production, combine enable.idempotence=true with acks=all, and adjust the max.in.flight cap as needed. In the official implementation, deduplication and ordering guarantees are strengthened when idempotence is enabled.

Monitor buffer.memory: bufferpool-wait-time, records-per-request, batch-size-avg
Blocking that hits max.block.ms is a sign of an oversized batch.size or a slow network
Default to enable.idempotence=true and acks=all. Watch compatibility when older clients are in the mix
Gain throughput from the combined effect of compression.type and batch/linger. Avoid unbounded retries

Production Tuning Procedure (Safe Steps and What to Verify)

As a prerequisite, validate on staging or on a topic you can isolate. Look at P95/P99 as well as the average. Monitor network and broker-side load at the same time.

Steps: 1) Measure the baseline (record-send-rate, request-latency-avg, batch-size-avg, compression-rate-avg, retries, etc.). 2) Move compression.type to lz4 or zstd. 3) Step batch.size up through 32 KB → 64 KB → 128 KB, and at the same time raise linger.ms through 0 → 5 → 10 → 20 ms, comparing as you go. 4) If P99 latency exceeds the budget, roll linger.ms back. 5) When memory pressure appears, drop batch.size one step and revisit send smoothing or the partition strategy. 6) Roll out to production gradually and watch broker load.

Measuring with the kafka-producer-perf-test CLI, or by reproducing your workload's message size distribution inside the application, makes the effect of each setting easier to read.

Revisit the compression algorithm first (the biggest single lever)
Check that buffer.memory has headroom before raising batch.size
Fine-tune linger.ms in 5 ms steps while watching P99 latency
Adjust max.in.flight and retries based on broker stability
Do not break the relationship between delivery.timeout.ms and request.timeout.ms

Sample producer configuration (Java Properties)

Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-1:9092,kafka-2:9092");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true");
props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "lz4");

// バッチと待機のチューニング（例: バランス型）
props.put(ProducerConfig.BATCH_SIZE_CONFIG, 65536);      // 64KB
props.put(ProducerConfig.LINGER_MS_CONFIG, 10);          // 10ms

// タイムアウトと再送の整合性
props.put(ProducerConfig.DELIVERY_TIMEOUT_MS_CONFIG, 120000); // 総配信上限
props.put(ProducerConfig.REQUEST_TIMEOUT_MS_CONFIG, 30000);
props.put(ProducerConfig.RETRIES_CONFIG, Integer.toString(Integer.MAX_VALUE));
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, 5); // イドポテンス有効時の既定と整合

// バッファ余裕をみて調整
props.put(ProducerConfig.BUFFER_MEMORY_CONFIG, 67108864); // 64MB

// キー・値のシリアライザ
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.ByteArraySerializer");

KafkaProducer<String, byte[]> producer = new KafkaProducer<>(props);

CCDAK Exam Key Points and Pitfalls

CCDAK frequently tests the roles of batch.size and linger.ms, the fact that batches are formed per partition, and the fact that size-based batching still occurs when linger.ms=0. batch.size is in bytes, not a record count.

Compression runs per batch, so the exam also tests the causal chain: larger batches lift the compression ratio and therefore throughput. Conversely, a classic gotcha is that a large linger.ms under low traffic degrades average latency.

Batches are per partition. When the arrival rate is high, linger.ms has a relatively small impact
Even with linger.ms=0, a full batch is sent immediately, so batching itself still happens
batch.size is the upper-bound byte count. Raising it increases potential buffer.memory consumption
Compression is per batch. Larger batches are more effective (with a CPU and latency trade-off)

Check Your Understanding

CCDAK

問題 1

Which of the following correctly describes linger.ms in the Kafka Producer?

The maximum time the producer waits for additional records when a batch is not full; once the batch is full it is sent without waiting
A setting that specifies the maximum number of records
Setting it to 0 fully disables batching, so each record is sent individually
It controls the number of network retries and only affects ordering guarantees

正解: A

linger.ms is the wait-time cap for batches that are not yet full. If the size condition is satisfied first, the batch is sent immediately. batch.size is in bytes, not a record count. Even with linger.ms=0, size-based batching still occurs. Retries and ordering are controlled mainly by retries, acks, enable.idempotence, and max.in.flight.

Frequently Asked Questions

For low-traffic topics, what is a good value for linger.ms?

Try values in the 5-20 ms range and tune while watching average latency along with P95/P99. When the size trigger rarely fires, linger.ms directly drives average latency, so keep it around 5-10 ms if your SLA is tight.

Will increasing batch.size cause out-of-memory issues?

Pending batches are allocated from buffer.memory, so the more partitions and connections that are active concurrently, the more memory you consume. Raise the value gradually and monitor batch-size-avg, bufferpool-wait-time, and max.block.ms behavior. If needed, revisit your partition count and parallelism design.

Does raising linger.ms cause more timeouts?

linger.ms adds wait time before send, but as long as delivery completes within delivery.timeout.ms (the total delivery cap), it does not immediately produce timeouts. However, when traffic is extremely sparse and retries plus broker latency stack up, you can approach the cap. Keep the configuration consistent with request.timeout.ms and retries.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Kafka Producer batch.size and linger.ms: Designing for Throughput and Latency