Tuning Kafka Log Retention: Storage Cost Control (2026)

Kafka retention is decided by three layers: when to delete (time), how much to keep (size), and where to cut (segments). They operate independently, and once any condition is met, deletion proceeds from the oldest segment.

CCAAK frequently asks about the precedence of ms-suffixed settings, topic-level overrides, the interaction of delete and compact, and the difference between rolling and retention. In production, the keys are disk headroom, monitoring, and pragmatic segment granularity.

Basics: Log Retention Concepts and Terminology

Kafka retention is evaluated per partition. The log is rotated into multiple segment files, and deletion is always at the segment level. New writes are always appended to the active segment, and once the threshold is exceeded, it rolls and a new segment is opened.

Time-based retention marks a segment for deletion when its maximum timestamp exceeds retention.ms. Size-based retention removes the oldest segment(s) when the partition's total size exceeds retention.bytes. Retention checks run on a fixed interval (log.retention.check.interval.ms).

Topic-level settings override broker defaults. If ms-suffixed settings (retention.ms, roll.ms, etc.) are present, they take precedence over the time-unit settings (hours, minutes) — this is the documented behavior. Defaults can change between versions, so explicit settings are safer for both exams and production.

Terminology: rolling = switching to a new segment; retention = deleting old segments
Evaluation unit: everything is per-partition, and replicas are evaluated independently
Deletion only happens at segment boundaries — never at the individual record level inside a segment

Flow of segment rotation and deletion inside a broker

Producer --> Broker

TopicA-0 (log dir)
  |
  |-- 00000000000000000000.log  [CLOSED]
  |-- 00000000000001234567.log  [CLOSED]  <-- oldest, candidate for deletion
  |-- 00000000000009876543.log  [ACTIVE]  <-- new appends
         ^            ^
         |            |
   retention check    roll (segment.bytes or roll.ms)
         |
   if time expired OR total size > retention.bytes => delete oldest CLOSED segment(s)

Minimal related settings in server.properties (broker defaults)

# 時間ベース保持（トピック未指定時の既定）
# log.retention.ms=604800000  # 例: 7日（バージョンにより既定は変わり得るため明示推奨）

# サイズベース保持（-1 は無制限）
# log.retention.bytes=-1

# セグメントサイズとロール（サイズ/時間のいずれかで新規セグメント作成）
# log.segment.bytes=1073741824   # 例: 1GiB（明示推奨）
# log.roll.ms=604800000          # 例: 7日（hours と併用しない）

# 保持チェック間隔
# log.retention.check.interval.ms=300000  # 例: 5分

Time-based vs. Size-based: What Deletes and When

Time-based retention uses the maximum timestamp inside a segment. The timestamp type follows message.timestamp.type (CreateTime or LogAppendTime), and with CreateTime you have to watch out for producer clock skew. Retention is evaluated every log.retention.check.interval.ms, and once conditions are met, deletion starts from the oldest matching segment.

Size-based retention keeps trimming the oldest segments while the partition's total size exceeds retention.bytes. Deletion triggers if either time or size hits its threshold. If retention.bytes is -1, size never triggers deletion and only time-based retention applies.

In production, a common dual fail-safe is to guarantee a time window with time-based retention while capping disk usage with size-based retention. For the exam, lock in three points: ms-suffixed settings take precedence, evaluation is per-partition, and there is a check interval.

Precedence rule: when retention.ms is set, it overrides hours/minutes
Evaluation unit: retention.bytes applies to the per-partition total size
Check interval: evaluation is not instant — it runs every log.retention.check.interval.ms

Axis	Main settings	Trigger timing	Operational notes
Time-based	retention.ms	Segment max timestamp is older than now minus retention period	Watch for CreateTime clock skew and keep clocks in sync
Size-based	retention.bytes	Keeps deleting as long as total size exceeds the cap	Set it low enough to leave a healthy disk safety margin
Segment	segment.bytes / roll.ms	Rolls when a threshold is reached on write	Too small means too many files; too large means slow recovery and compaction

Per-topic retention settings (time and size)

# 時間ベース保持を 7 日、サイズベースを 50GiB に設定（例）
bin/kafka-configs.sh --bootstrap-server <broker:9092> \
  --alter --topic my-topic \
  --add-config retention.ms=604800000,retention.bytes=$((50*1024*1024*1024))

# 設定の確認
bin/kafka-configs.sh --bootstrap-server <broker:9092> \
  --describe --topic my-topic | grep -E 'retention'

Segment Design and Practical Rolling Tips

Segments are the smallest unit of log deletion, and rolling is the trigger that starts a new segment. A roll normally happens when log.segment.bytes is reached or log.roll.ms (or hours) is exceeded. Rolling is distinct from retention — a roll alone does not immediately delete anything.

The design principle is balancing recovery time, leader failover, log cleaner efficiency, and the file count. For high throughput, set a larger segment.bytes for better I/O efficiency; for low traffic, configure roll.ms so you never end up stuck with one giant segment forever. Defaults can shift across releases, so set values explicitly to match your workload.

Rule of thumb: tune segment.bytes/roll.ms so you produce roughly one segment every 1-2 hours
Segments that are too small drive up file count and index overhead
Segments that are too large make recovery and compaction batch work heavy

Example of rolling tweaks (mitigating low traffic)

# 低トラフィックでサイズ到達が遅い場合、時間ロールを明示
bin/kafka-configs.sh --bootstrap-server <broker:9092> \
  --alter --topic events-low \
  --add-config roll.ms=7200000,segment.bytes=$((256*1024*1024))  # 2時間 or 256MiB の早い方

Interaction Between cleanup.policy and Retention (delete/compact)

With cleanup.policy=delete, the time and size retention discussed in this article apply directly, and deletion happens at the oldest-segment level.

cleanup.policy=compact marks and removes older records that share a key — a separate path from retention. When you combine compact,delete, compaction thins out older versions by key while the delete side deletes segments by time or size. Tombstone retention is controlled by delete.retention.ms, and after that interval the tombstones themselves are removed.

Compaction-related knobs include log.cleaner.enable, min.cleanable.dirty.ratio, and log.cleaner.min.compaction.lag.ms. Because segment granularity also drives compaction batch size, an extremely large segment.bytes makes each compaction pass heavier.

retention.ms/bytes are still in effect with compact. When combined with delete, the two trigger independently
Tombstones are deleted once delete.retention.ms has elapsed
Compaction needs adequate free disk and I/O bandwidth to make progress

Example settings when combining compaction

# コンパクトかつ時間保持（7日）を適用
bin/kafka-topics.sh --bootstrap-server <broker:9092> \
  --create --topic kv-store \
  --partitions 6 --replication-factor 3 \
  --config cleanup.policy=compact,delete \
  --config retention.ms=604800000 \
  --config delete.retention.ms=86400000 \
  --config segment.bytes=$((256*1024*1024))

Operational Tuning Procedure and Monitoring

The basic procedure is to check the current topic settings, estimate the impact, and roll changes out gradually. Before tightening the size cap, inspect disk usage so that cascaded deletions don't cause a sudden I/O spike. Before extending retention, re-estimate the required disk based on free space and throughput, then apply.

Monitoring should cover per-partition size, segment count, deletion and roll rates, and log cleaner throughput and lag. Use kafka.log JMX metrics and broker logs, and run describeLogDirs as needed to inspect disk usage per broker.

Gradual rollout: validate on a subset of topics or staging first, then roll out
Disk safety margin: aim to keep 20-30% headroom against steady-state usage
Make settings provenance clear: always check whether a value is a broker default or a topic-level override

Example commands for checking the current state and estimating impact

# ブローカごとのログディレクトリ使用量
bin/kafka-log-dirs.sh --bootstrap-server <broker:9092> --describe

# トピック設定の出所確認
bin/kafka-configs.sh --bootstrap-server <broker:9092> --describe --topic my-topic

# セグメントファイルの概数を把握（ブローカ側で）
ls /kafka-logs/my-topic-*/ | wc -l

Recommended Settings and Pitfalls by Scenario

Event streaming (7-day guarantee with spike protection): guarantee 7 days by time and cap with size. Start segments at 512 MiB-1 GiB and tune from measurements. If you use CreateTime, include producer clock sync in your SRE runbook.

Audit logs (90-day retention, capacity-first): cap hard by size while targeting 90 days by time. Assume the size trigger may fire first in practice, and design archive integration up front to meet observability requirements.

State-store changelogs (compact combined): run compact,delete together. Keep delete.retention.ms short and segment.bytes moderate to contain compaction batch load.

Events: retention.ms=7d, retention.bytes=cap including a disk safety margin, segment.bytes=512 MiB-1 GiB
Audit: retention.ms=90d, retention.bytes=realistic cap, explicitly set roll.ms to prevent segment bloat
Changelog: use cleanup.policy=compact,delete and set delete.retention.ms on the short side

Example settings applied per scenario

# イベントストリーミング
bin/kafka-configs.sh --bootstrap-server <broker:9092> --alter --topic events \
  --add-config retention.ms=604800000,retention.bytes=$((200*1024*1024*1024)),segment.bytes=$((512*1024*1024))

# 監査ログ
bin/kafka-configs.sh --bootstrap-server <broker:9092> --alter --topic audit \
  --add-config retention.ms=$((90*24*60*60*1000)),retention.bytes=$((500*1024*1024*1024)),roll.ms=$((24*60*60*1000))

# チェンジログ
bin/kafka-configs.sh --bootstrap-server <broker:9092> --alter --topic changelog \
  --add-config cleanup.policy=compact,delete,delete.retention.ms=$((12*60*60*1000)),segment.bytes=$((256*1024*1024))

Check Your Understanding

CCAAK

問題 1

A topic is configured with retention.ms=172800000 and retention.bytes=107374182400. A partition's total size has reached 150 GiB, and the oldest segment's max timestamp is 1 day old. Which best describes what happens at the next retention check?

The size cap is exceeded, so deletion starts from the oldest segment
The time condition is not met, so no deletion occurs
Neither condition is met, so only a roll happens
A new segment is created immediately and the old segments are retained

正解: A

retention.bytes is the cap on per-partition total size. 150 GiB exceeds 100 GiB, so the retention check deletes from the oldest segment. The time condition (less than 2 days) is not met, but time and size are independent triggers — meeting either one drives deletion forward.

Frequently Asked Questions

Which wins between ms-suffixed settings and hours/minutes?

When equivalent settings coexist, the ms-suffixed settings (retention.ms, roll.ms) take precedence. For both exams and production, it is safer to specify the ms variant explicitly.

Is retention.bytes a cluster-wide total?

No. It is evaluated per partition. For an entire topic, the theoretical upper bound is roughly partitions x retention.bytes.

Does extending the retention period immediately grow existing data?

No. Retention only suppresses deletions, so data does not grow instantly. However, future deletes are delayed, so disk usage rises over time. Estimate the required capacity beforehand.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Kafka Log Retention Tuning: Time, Size & Segment Settings for Ops and CCAAK