Kafka retention settings hit not just availability and reprocessing, but also disk and cloud costs directly. The CCAAK (Confluent Certified Administrator for Apache Kafka) exam frequently tests the correct combination of retention.ms, retention.bytes, and cleanup.policy, along with a real understanding of how segment deletion actually behaves.
This article follows the official documentation faithfully, summarizing design guidance for time-based, size-based, and compaction strategies, along with how to estimate operating costs. It balances exam-prep essentials with concrete advice you can apply on the job without hesitation.
Kafka evaluates retention at the partition's segment-file level. Active segments are never candidates for deletion. Segments roll when they reach log.segment.bytes (size) or log.roll.ms/hours (time), and they only become eligible for retention checks once they go inactive.
With cleanup.policy=delete, retention fires under two conditions: time-based (retention.ms) removes "old" segments, and size-based (retention.bytes) trims the oldest first whenever total size exceeds the cap. Either condition triggers deletion. The check cadence is controlled by log.retention.check.interval.ms.
Partitions, segments, and retention evaluation at a glance
Baseline configuration aligned with the docs (server defaults plus topic overrides)
# broker側(server.properties の一例)
log.retention.check.interval.ms=300000
log.segment.bytes=1073741824 # 1 GiB
log.retention.ms=604800000 # 7日(トピック未設定時の既定)
# トピック作成時に上書き(時間ベース保持)
$ kafka-topics.sh \
--create --topic orders \
--partitions 6 --replication-factor 3 \
--config retention.ms=259200000 # 3日
# 後からサイズベースへ切替(片方のみでも可)
$ kafka-configs.sh --alter --entity-type topics --entity-name orders \
--add-config retention.bytes=21474836480 # 20 GiB/partition
--delete-config retention.ms # 時間保持を無効化retention.ms expresses reprocessing and catch-up headroom directly. It fits well when consumer-lag tolerance, recovery time, and downstream batch re-run windows are clearly defined. Once a segment's last-modified time crosses the threshold, inactive segments are removed.
The catch: with light traffic, segments may not roll and can remain active for a long time. In that case, set log.roll.ms to force a time-based roll so deletion proceeds as expected.
Time-based retention with forced segment rolling
$ kafka-configs.sh --alter --entity-type topics --entity-name audit_log \
--add-config retention.ms=1209600000 # 14日
--add-config min.insync.replicas=2
# アクティブ維持を避けるため、ブローカでの強制ロール例
log.roll.ms=21600000 # 6時間ごとにロール(サーバー再起動で反映)retention.bytes sets a per-partition cap on total log size; when exceeded, the oldest inactive segments are deleted first. Even with fluctuating traffic, disk usage stays bounded by the cap, which suits capacity planning and bill control.
However, a traffic spike can briefly accelerate new-data ingress and cause old data to disappear sooner than expected. If you must guarantee a minimum reprocessing window, set retention.bytes with ample buffer (rather than combining it with retention.ms), or backstop the cap with monitoring and alerts.
Setting size-based retention and verifying it
$ kafka-configs.sh --alter --entity-type topics --entity-name events \
--add-config retention.bytes=32212254720 # 30 GiB/partition
# 現在の設定を確認
$ kafka-configs.sh --describe --entity-type topics --entity-name events | grep retention
retention.bytes=32212254720 sensitive=false synonyms={DYNAMIC_TOPIC_CONFIG:32212254720}cleanup.policy=delete drops whole segments once they fall out of retention. cleanup.policy=compact, by contrast, collapses duplicate keys and keeps only the latest value per key. Compact suits state-sync and CDC logs; if you also need historical entries, consider combining the two as compact,delete.
Compaction tombstones (null values) are kept until delete.retention.ms elapses. If that window is too short, lagging consumers may miss delete events. When combining compact and delete, take care that retention.ms and segment rolling don't conspire to drop tombstones early.
| Retention type | Main use case | Delete/compact trigger | Key settings |
|---|---|---|---|
| delete (time) | Guarantee a reprocessing window | Old segments are deleted once retention.ms elapses | retention.ms, log.roll.ms |
| delete (size) | Fix a hard cost cap | When total size exceeds retention.bytes, oldest segments are deleted first | retention.bytes, log.segment.bytes |
| compact | Keep latest state (CDC / state broadcast) | The log cleaner compacts duplicate keys; tombstones are removed after delete.retention.ms | cleanup.policy=compact, delete.retention.ms, min.cleanable.dirty.ratio |
| compact,delete | Latest state plus bounded history | Compaction plus time/size retention | cleanup.policy=compact,delete + retention.* |
A minimal safe configuration for combining compact,delete
$ kafka-configs.sh --alter --entity-type topics --entity-name users_kv \
--add-config cleanup.policy=compact,delete \
--add-config delete.retention.ms=604800000 \
--add-config min.cleanable.dirty.ratio=0.2 \
--add-config retention.ms=1209600000 # 14日(履歴の上限)On-prem deployments factor in disk cost; cloud services factor in disk-equivalent plus object storage charges (when tiered storage is enabled). The rough estimate depends on average throughput, retention duration (or size cap), replication factor, and compression ratio.
Size-based retention turns the cap into your cost ceiling. For time-based retention, model duration as a variable and always include peak traffic plus headroom (for example, 30%). Active segments can't be deleted and may push usage above the line temporarily, so design monitoring and capacity-alert thresholds in tandem.
A quick back-of-the-envelope script (bash, bc)
# 入力: 50 MB/s、圧縮後係数0.5、保持3日、RF=3、ヘッドルーム30%
IN_MBPS=50
COMP=0.5
RET_DAYS=3
RF=3
HEAD=0.3
BYTES_PER_SEC=$(echo "$IN_MBPS*1024*1024" | bc)
EFFECTIVE=$(echo "$BYTES_PER_SEC*$COMP" | bc)
SECONDS=$(echo "$RET_DAYS*24*3600" | bc)
RAW=$(echo "$EFFECTIVE*$SECONDS*$RF" | bc)
TOTAL=$(echo "$RAW*(1+$HEAD)" | bc)
echo "Required bytes: $TOTAL"Deletion isn't instantaneous. It depends on the granularity of log.retention.check.interval.ms and on when segments actually go inactive. Tweaking config right before disk pressure peaks may not take effect in time. Fire capacity alerts with plenty of headroom and force-roll old segments to drive deletion forward.
For monitoring, track per-partition LogSize, broker disk usage, UnderReplicatedPartitions, and consumer lag together to catch mismatches between retention settings and SLAs early. When using compaction, continuously observe cleaner throughput and lag.
Quick-check commands handy in operations
# ブローカのログディレクトリ使用量(例)
$ du -sh /kafka-logs/*
# トピック別の総サイズ(概算)。JMXやツール併用が推奨
$ kafka-log-dirs.sh --describe --bootstrap-server <broker:9092> \
--describe-config --topic-list orders
# 重要メトリクス(例):
# kafka.log:type=Log,name=Size,topic=<t>,partition=<p>
# kafka.server:type=ReplicaManager,name=UnderReplicatedPartitionsCCAAK
問題 1
An event topic experiences occasional heavy spikes. You must enforce a hard cost cap while also guaranteeing at least a 48-hour reprocessing window. Which retention design is most appropriate?
正解: A
Spikes mean time-only retention could overflow disk, so you need a size cap to anchor cost. Meeting the 48-hour reprocessing floor, however, requires sizing retention.bytes to cover the spike-time overshoot, not just steady-state. C still risks breaking the 48-hour guarantee even with a shorter check interval, because the cap itself is too strict. B doesn't bound cost. D fails to retain history at all.
What happens if I set both time-based and size-based retention?
Deletion fires on either condition. If total size exceeds the cap, the oldest segments are removed; if segments age past the threshold, they are removed too. In both cases the active segment is exempt.
Why isn't deletion making progress?
The segment may still be active and not yet rolled, log.retention.check.interval.ms may be too long, the segment's last-modified time may not have crossed the retention threshold, or the broker's cleanup may be lagging. Check log.roll.ms and your monitoring to diagnose.
What should I watch out for when combining compaction and time-based retention?
Tombstones (null values) must be kept for at least delete.retention.ms. If that window is too short, lagging consumers may miss delete events. Whole segments can also disappear early due to retention.ms before tombstones propagate, so align segment rolling with each timer carefully.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...