Kafka compression is fundamentally applied per Producer record batch, and the Broker stores and replicates those batches as-is. The exception: if the topic-level compression.type is set to a concrete algorithm, the Broker may re-compress.
This article walks through the characteristics of each algorithm, configuration patterns, and monitoring/validation methods, and wraps up with the points that come up most often on CCDAK/CCAAK.
In Kafka, the Producer batches records and compresses each batch according to compression.type. The Broker writes those compressed batches to the log almost verbatim, and replication carries the same compressed bytes. On the Consumer side, the client library decompresses on fetch.
When the topic's compression.type is set to producer, batches are stored using whatever algorithm the Producer chose. When it is set to a concrete value like gzip/snappy/lz4/zstd, the Broker re-compresses on receive, raising CPU load and latency. Compression operates on record batches, so larger batches generally improve compression efficiency.
Size limits apply to the compressed size. Specifically, the Producer's max.request.size and the Broker/topic's message.max.bytes (or max.message.bytes) are evaluated against the compressed record batch size.
| Algorithm | Typical Ratio | Speed (Compress / Decompress) | Latency Profile |
|---|---|---|---|
| gzip | High | Low / Medium | High (slow) |
| snappy | Medium | High / High | Low (fast) |
| lz4 | Medium-High | Very High / High | Low (very fast) |
| zstd | High-Very High | Medium-High / Medium-High | Medium (depends on level and data) |
Producer compression and Broker storage flow (branching by topic config)
Producer
|
| 1) Batch + compress (compression.type)
v
[Compressed record batch]
|
|--> If topic compression.type = producer ----------
| |
v v
Send to Broker ------------------------------> Store in log as-is
|
|--> If topic compression.type = gzip/lz4/... ----
|
v
Broker re-compresses
|
v
Store in log
|
v
Replication (still compressed)
|
v
Consumer decompressesMain compression types (Producer compression.type)
none
gzip
snappy
lz4
zstdText and repetitive JSON/Avro compress well. Binary data or already-compressed data (images, video, ZIP) sees little benefit, or can even grow. With small batches, every algorithm produces only limited gains.
For low latency and high throughput, lz4 or snappy is the safe bet. If network bandwidth is the bottleneck and you have CPU headroom, zstd wins. gzip compresses hard but is slow, so save it for latency-tolerant or overnight-batch workloads.
In mixed environments, either keep topic compression.type=producer and let each Producer optimize for itself, or — if you want a uniform algorithm — align every Producer to the same setting. Pinning it at the topic level forces Broker re-compression.
Basic Java Producer configuration example
Properties p = new Properties();
p.put("bootstrap.servers", "broker1:9092,broker2:9092");
p.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
p.put("value.serializer", "org.apache.kafka.common.serialization.ByteArraySerializer");
// Choose the compression algorithm
p.put("compression.type", "lz4");
// Batch / latency trade-off
p.put("batch.size", "131072"); // ~128KB target
p.put("linger.ms", "10"); // wait 10ms to batch
KafkaProducer<String, byte[]> producer = new KafkaProducer<>(p);As a rule of thumb, decide the algorithm at the Producer and leave the topic's compression.type as producer. That avoids Broker re-compression and saves both CPU and latency.
Only when uniformity across Producers is impractical — and you need it for audit or bandwidth-control reasons — pin a concrete algorithm (gzip/snappy/lz4/zstd) on the topic. Expect higher Broker CPU and latency, and plan capacity and slow-client impact accordingly.
Even with TLS or SASL in play, compression is applied before encryption, so the bandwidth savings remain intact.
Concrete topic config examples (Broker CLI)
# Switch an existing topic to Producer-first (avoid re-compression)
kafka-configs --bootstrap-server localhost:9092 \
--entity-type topics --entity-name orders \
--alter --add-config compression.type=producer
# Standardize on lz4 at the topic (forces re-compression)
kafka-configs --bootstrap-server localhost:9092 \
--entity-type topics --entity-name orders \
--alter --add-config compression.type=lz4
# Create a new topic with compression.type=producer
kafka-topics --bootstrap-server localhost:9092 \
--create --topic events --partitions 12 --replication-factor 3 \
--config compression.type=producerCompression costs CPU and latency. When you roll it out, use load-test tools and metrics to watch CPU usage, BytesIn/Out, and request latency (RequestQueueTime/LocalTime/RemoteTime). Where re-compression is in play, expect a sharp jump in Broker CPU.
On multi-tenant clusters, pair heavy compression (gzip/zstd) with throughput control, quotas, and batch tuning so peaks from one Producer don't bleed into other workloads.
Consumer-side decompression also burns CPU, so monitor consumer-side latency, GC, and throughput regressions alongside the Broker.
Performance validation example (kafka-producer-perf-test)
# Send with lz4 (1KB messages, 1M records)
kafka-producer-perf-test --topic perf --num-records 1000000 \
--record-size 1024 --throughput -1 --producer-props \
bootstrap.servers=localhost:9092 compression.type=lz4 batch.size=131072 linger.ms=10
# Send with zstd (bandwidth-first)
kafka-producer-perf-test --topic perf --num-records 1000000 \
--record-size 1024 --throughput -1 --producer-props \
bootstrap.servers=localhost:9092 compression.type=zstd batch.size=131072 linger.ms=20The core of real-world selection is being clear about which dimension is your bottleneck: latency SLA, CPU, or bandwidth. Use the quick guide below as a starting point and then validate against a representative internal workload.
On the exams, the recurring themes are: where compression is applied (Producer-centric), using topic compression.type=producer to avoid re-compression, the fact that replication carries the compressed bytes, and how batching affects compression efficiency. When to pick zstd vs lz4 also comes up.
Compression algorithm cheat sheet
snappy: fast / medium ratio / low CPU -> low latency
lz4: even faster / medium-high ratio / low-medium CPU -> default pick
zstd: high ratio / medium-high CPU -> bandwidth-first
gzip: high ratio / high CPU / slow -> batch / latency-tolerant
Topic: compression.type=producer (avoid re-compression)
Producer: compression.type=<one of the above>
Batching: batch.size up + linger.ms up -> ratio up (latency up)CCDAK / CCAAK
問題 1
An administrator wants to reduce network bandwidth while avoiding a Broker CPU increase. Multiple Producers write to the same topic, and their settings can be standardized. Which configuration is most appropriate?
正解: A
Standardizing on lz4 in every Producer while keeping the topic at compression.type=producer cuts bandwidth without triggering Broker re-compression. Pinning the algorithm at the topic forces re-compression whenever a Producer ships a different format, raising CPU and latency. A Broker-wide setting cannot forcibly transcode incoming Producer traffic. Disabling compression contradicts the bandwidth-reduction goal.
Where does compression happen, and where is data decompressed?
The Producer compresses record batches. The Broker normally stores and replicates them as-is, and the Consumer decompresses on receive. If the topic config specifies a concrete compression type, the Broker will re-compress the batch.
How do batch settings affect compression efficiency?
Compression is applied per batch, so increasing batch.size or linger.ms to grow batch size generally improves the compression ratio. The trade-off is added latency, so tune these against your latency SLA.
Do TLS or SASL cancel out compression benefits?
No. Compression is applied before encryption, so the bandwidth savings are preserved. The compressed bytes are then encrypted on the wire.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...