Kafka cost optimization starts by aligning three levers: partition count, compression, and retention (retention/compaction). Too many partitions inflate memory, file descriptors, and replication traffic; a mismatched compression scheme drives CPU overload and throughput degradation; and the wrong retention policy causes storage to explode.
This article organizes the high-yield points commonly tested on the exam (CCAAK) and the practical answers from real-world operations, grounded in the official documentation. To avoid version-dependent behavior, the discussion is limited to stable features (topic settings, basic producer/consumer settings, and the principles of log compaction).
Partition count caps parallelism, but each partition carries fixed costs: metadata, page cache, log segments, replication threads, and so on. Over-provisioning rapidly increases broker memory consumption, open file count, and the load on the controller and metadata propagation.
As a rough estimate, total segment count ≈ partition count × (active + rotated segments). Larger retention.bytes/segment.bytes values inflate the segment count and increase file descriptor and disk seek costs. Replication network egress scales roughly with write throughput × (replication.factor - 1).
Gradually increasing partition count (since shrinking is not possible, plan ahead)
# Increase partition count. Watch cluster balance as assignment proceeds
kafka-topics.sh --bootstrap-server <broker:9092> \
--alter --topic orders --partitions 24
# Beware of over-provisioning. After assignment, plan for the network/disk I/O of rebalancingKafka compression is fundamentally done on the producer side. With the topic setting compression.type=producer (default), the producer's setting is applied. Compression lowers disk and network usage but affects CPU cost and latency. Pick an algorithm by understanding its characteristics and how it interacts with message size and batching.
In general, zstd offers high compression at high CPU cost, lz4 delivers low latency and moderate compression, snappy is lightweight and stable, gzip is broadly compatible but CPU-heavy, and no compression minimizes CPU at the cost of bandwidth and disk. Compression has little effect on very small batches, so the standard practice is to combine linger.ms and batch.size to form meaningful units.
| Algorithm | Typical compression ratio | CPU cost | Throughput / latency impact |
|---|---|---|---|
| zstd | High (2-5x compression is common) | High | Highly efficient with large batches; CPU-heavy |
| lz4 | Medium (around 1.5-3x) | Medium-low | Stable low latency, good throughput |
| snappy | Medium (around 1.5-2x) | Low | Stable, light on CPU |
| gzip | Medium-high | Medium-high | Latency rises when CPU is the bottleneck |
| none | None | Minimal | Consumes bandwidth and disk |
Example producer compression and batching settings
props.put("compression.type", "zstd");
props.put("linger.ms", "15"); // Adjust within latency tolerance
props.put("batch.size", "131072"); // ~128KB target (A/B test for your workload)
props.put("acks", "all"); // Strong durability with minimal wasted retriesRetention design combines delete (drop entire segments by time/size) and compact (keep the latest record per key). For keyed data where the latest version must always remain (e.g. current entity state), compact is the choice; for event history, delete is the default. The combined compact,delete is useful when you want to retain history for a fixed window while still keeping the latest version, but if retention is too short, delete may race ahead of compaction and break your assumptions.
retention.bytes and retention.ms can each be set (or used together). When you must strictly stay within a storage budget, prioritize bytes; when compliance dictates a retention window, prioritize ms. segment.bytes affects compactor efficiency, file count, and page cache efficiency, so set it by balancing I/O against memory.
delete vs compact (conceptual diagram)
Safe combinations of topic retention/compaction
# Latest-version guarantee (no history needed)
kafka-configs.sh --bootstrap-server <broker> --alter \
--topic entity-state \
--add-config cleanup.policy=compact,segment.bytes=134217728,min.cleanable.dirty.ratio=0.5
# Keep recent history too (compact + delete, with care)
kafka-configs.sh --bootstrap-server <broker> --alter \
--topic entity-state-history \
--add-config cleanup.policy=compact,delete,retention.ms=1209600000,segment.ms=604800000
# Strict storage cap (bytes-first)
kafka-configs.sh --bootstrap-server <broker> --alter \
--topic metrics \
--add-config retention.bytes=10737418240Storage estimate: with write rate B (bytes/sec), post-compression ratio r (0<r<=1), retention period T (seconds), partition count P, and replicas R, total required disk ≈ B × r × T × R. When using bytes limits, retention.bytes × P × R is close to the upper bound for cluster consumption (overhead aside).
Network replication traffic (broker ingress): B × r × (R-1). Client egress depends on consumer count and filtering. At the estimation stage, plan with both peak and average values so spikes can be absorbed.
Quick calculation note (shell)
# B=20MB/s, post-compression r=0.4, R=3, T=7 days
B=$((20*1024*1024))
r=0.4
R=3
T=$((7*24*3600))
echo "Disk ~= $(awk -v b=$B -v r=$r -v t=$T -v R=$R 'BEGIN{printf "%.1f GiB\n", b*r*t*R/1024/1024/1024}')"On the producer side, batching via linger.ms and batch.size, plus compression and the right acks, suppress wasteful retries and small-grained sends. The smaller the records, the larger the gain from batching, directly cutting network/disk overhead.
On the consumer side, fetch.min.bytes and fetch.max.wait.ms aggregate fetches, and max.partition.fetch.bytes and session.timeout.ms are tuned to the workload. Values that are too small increase RPC count and context switches, driving up CPU and network cost.
Example consumer fetch optimization
props.put("fetch.min.bytes", "1048576"); // 1MB aggregation
props.put("fetch.max.wait.ms", "50"); // batching wait
props.put("max.partition.fetch.bytes", "5242880");
props.put("enable.auto.commit", "false"); // control to match idempotencyFor monitoring, continuously visualize disk usage (trended per topic/partition), replication lag (ISR, UnderReplicatedPartitions, Follower lag), network egress, compaction metrics (log cleaner progress/JMX), and request latency. Some retention and compaction settings take effect immediately, so always track behavior after a dynamic change.
For automation, suppress noisy neighbors with quotas (per-producer/consumer/client-ID bandwidth limits), and pair storage thresholds with alerts and write controls. CCAAK targets your understanding of dynamic settings (kafka-configs.sh), safe rolling application, and the configuration precedence (broker < topic < client).
Example dynamic settings for quotas/cleaner threads
# Throttle produce bandwidth per client-id (overload prevention)
kafka-configs.sh --bootstrap-server <broker> --alter \
--add-config 'producer_byte_rate=1048576' --entity-type clients --entity-name appA
# Log cleaner threads (broker restart may be required; check whether dynamic change is supported in your environment)
# server.properties: log.cleaner.threads=2CCAAK
問題 1
For an audit-driven requirement, you must always retain the latest version per key while cutting older values as much as possible to reduce storage cost. Which topic configuration is most appropriate?
正解: A
Forcing latest-version retention requires log compaction. With delete alone, the latest version can be lost once it exceeds retention. Combining compact and delete with a short retention risks losing the latest version because compaction cannot keep up. Changing only the compression scheme cannot satisfy the retention requirement.
Can compression be set on the broker as well? Where is the right place to decide it?
Set compression.type to "producer" (the default) at the topic level. Producers do the actual compression, and keeping the same algorithm end-to-end is the most efficient path. Avoid re-compression at the broker since it adds CPU and latency cost.
If I combine compact and delete, will the latest version always be retained?
If you need a guaranteed latest-version retention, compact alone is safest. With compact,delete combined, a short retention can cause delete to remove old segments before compaction finishes, which may break the latest-version guarantee.
What are the minimum safety measures when increasing partition count?
Scale up gradually and monitor broker disk usage, file descriptors, replication lag, and GC. Execute during a maintenance window to absorb the network/disk load of rebalancing, and tighten throttling (quotas) temporarily if needed. Since you cannot shrink partition count, do not over-provision in advance.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...