Kafka

Kafka Log Compaction Explained: Per-Key Latest-Value Retention in Practice

2026-04-19
NicheeLab Editorial Team

Log compaction is a Kafka cleanup policy that asynchronously prunes older records sharing the same key, retaining at least the latest (last-written) value. It is ideal for use cases where you want the current state rather than the full history.

This article walks through how log compaction works, the guarantees it provides, the key configuration knobs, suitable use cases, and operational tips, while highlighting what the exams test. Special focus is given to tombstones (null values), the compact+delete combination, and how segments interact with the log cleaner.

1. Log Compaction Basics and Guarantees

When a topic has cleanup.policy=compact, log compaction retains at least one record — the last write — for each key within a partition. As a result, a consumer can rebuild a key→value map of the latest state simply by reading the topic from the beginning.

A record with a null value (known as a tombstone) signals deletion of that key. The tombstone itself is retained for delete.retention.ms and then physically removed by a subsequent cleaning pass. Log compaction is not immediate; it runs asynchronously in the background.

  • The unit is the partition: the same key always maps to the same partition (via hashing), and the latest-value guarantee holds within each partition.
  • Ordering is preserved: relative ordering of the surviving records is maintained even after compaction.
  • Only closed segments are eligible for cleaning: the currently active (open) segment is never compacted.
  • Tombstones are delete markers: after their retention period, they are removed along with the key's older values.
  • Keys are required: a record with a value but no key cannot be compacted (it is effectively ignored by compaction).
Cleanup policyRetention criteriaPrimary use case
deleteTime/size (retention.ms / retention.bytes)Event logs that need the full history
compactLatest value per key (plus the recent uncompacted tail)State synchronization, caches, reference data
compact,deleteLatest value plus time-based deletion of old segmentsKeeping the latest value while tightly bounding disk usage

Conceptual view of updates to keys A and B with compaction

Partition P0 (time →)
| A:1 | B:x | A:2 | A:3 | B:y | A:null | B:z |
            ^  older A entries are superseded by A:3
                       ^  A:null is the delete marker for A (tombstone)
After compaction (conceptual): | ... | A:null | B:z |
Result: A is deleted, B keeps its latest value z

Producing records with keys (console-producer)

kafka-console-producer \
  --broker-list localhost:9092 \
  --topic users-compact \
  --property parse.key=true \
  --property key.separator=:
# Example: userId:json
u1:{"name":"Ann","v":1}
u1:{"name":"Ann","v":2}
u2:{"name":"Bob","v":1}
# After compaction (some time later), only u1's v=2 remains

2. Inside Segments and the Log Cleaner

Each partition is split into multiple segments. The active (currently written) segment is excluded; only rotated (closed) segments become cleaning candidates. The cleaner indexes the latest offset per key and rewrites a new segment with the old duplicates dropped.

How “dirty” (how much duplication) is required before cleaning runs is controlled by min.cleanable.dirty.ratio and related thresholds. Cleaner thread count, I/O throughput, and compression codec are independent settings, and cleaning is not scheduled on a fixed time basis.

  • Cleaning runs inside the broker process and is performed independently on each replica.
  • After cleaning, the new segment is activated and the old segments become candidates for deletion (physically removed once fully obsolete).
  • Tombstones live for delete.retention.ms; if the same key is re-ingested in that window, the delete semantics are effectively cancelled.
Setting / conceptScopePurpose / key point
log.cleaner.enableBrokerEnables the log cleaner (typically on by default)
log.cleaner.threadsBrokerTunes the number of concurrent cleaning threads
min.cleanable.dirty.ratio (broker / topic)BothSets how much duplication must accumulate before cleaning is triggered

The segment cleaning flow

Before:
[Seg-1(closed)] [Seg-2(closed)] [Seg-3(active)]
  A:1 B:1 A:2     B:2 A:3        ...
Cleaner →
  - Pick the latest value per key (A:3, B:2)
  - Relocate them into a freshly cleaned segment
After:
[Seg-1'(cleaned: A:3 B:2)] [Seg-3(active)]
Old Seg-1 / Seg-2 become deletion candidates

Inspecting segment contents (kafka-dump-log.sh)

kafka-dump-log.sh --skip-record-metadata --print-data-log \
  --files /var/lib/kafka/data/users-compact-0/00000000000000000000.log | head -n 50
# Compare the surviving records per key before and after compaction

3. Key Configurations and Exam Pitfalls

Log compaction is enabled per topic, and broker-level defaults can be overridden at the topic level. Pay particular attention to how cleanup.policy, delete.retention.ms, min.cleanable.dirty.ratio, segment.ms / segment.bytes, and min.compaction.lag.ms / max.compaction.lag.ms interact.

Using compact,delete together lets you retain the latest value while time-deleting older, fully-cleaned segments. Tombstones still live for delete.retention.ms, so deletion is never immediate. Exams frequently test three points: keys are required, what a tombstone means, and that cleaning is asynchronous.

  • Explicitly set cleanup.policy=compact or compact,delete.
  • delete.retention.ms controls how long tombstones live. If it is too short, lagging consumers can fail to rebuild state correctly.
  • min.compaction.lag.ms is the minimum delay before a record becomes eligible for compaction — don't confuse it with max.compaction.lag.ms.
  • Segments must rotate before they can be cleaned, so segment.ms / segment.bytes matter just as much.
ParameterExam angleWatch out for
cleanup.policyDifference between compact and deleteWhen combined, both latest-value retention and time-based deletion apply
delete.retention.msTombstone lifetimeToo short risks unrecoverable state; too long pressures disk space
min.compaction.lag.ms/max.compaction.lag.msCompaction timingUnderstand the impact of both too-early and too-late compaction

How retention and compaction are applied in sequence

Write → segment rotation → (threshold reached) compaction →
[compact only] latest value is retained
[compact,delete] additionally deletes segments that are both fully cleaned and past their retention

Enabling compaction at topic creation time

kafka-topics.sh --create \
  --bootstrap-server localhost:9092 \
  --topic users-compact \
  --partitions 6 \
  --replication-factor 3 \
  --config cleanup.policy=compact \
  --config min.cleanable.dirty.ratio=0.5 \
  --config delete.retention.ms=86400000 \
  --config segment.ms=3600000

4. Use Cases and Anti-Patterns

Typical use cases include CDC update streams (e.g. a customer's latest address, current inventory levels), cache warm-up and refresh, and Kafka Streams state-store changelogs. In all of these, the latest state is what matters, not every intermediate update.

Anti-patterns include audit logs that require a full history, event sourcing that needs to replay every event in order, and data with no keys (or essentially unique keys that are never updated). These cases call for delete-based retention or a different storage system.

  • CDC + compaction: pairs well with Update/Delete events from tools like Debezium.
  • Cache rebuilding: read from earliest and materialize the final key→value map.
  • State stores: Kafka Streams changelog topics should use compact.
  • Anti-patterns: no key, single-write events, or any requirement to preserve the full history.
ScenarioRecommended topic policyWatch out for
Latest customer profile statecompactKey design is mandatory (same customer → same key)
Inventory level synccompact,deleteCaps disk usage while keeping the latest value
Audit and trace logsdeleteFull history is required, so compaction is inappropriate

From CDC to a rebuilt latest-state view

DB → CDC (updates/deletes) → [compacted topic] →
            → cache / view (latest value per key only)
Delete events flow as tombstones and are physically removed after their retention window

Sending a tombstone (Java Producer)

Properties p = new Properties();
p.put("bootstrap.servers", "localhost:9092");
p.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
p.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
try (KafkaProducer<String, String> producer = new KafkaProducer<>(p)) {
    // Normal latest value
    producer.send(new ProducerRecord<>("users-compact", "u1", "{\"name\":\"Ann\"}"));
    // Delete (tombstone): pass null as the value
    producer.send(new ProducerRecord<>("users-compact", "u1", null));
    producer.flush();
}

5. Operations, Monitoring, and Troubleshooting

Compaction is asynchronous and sensitive to load and threshold settings. Track log cleaner thread activity, cleanable bytes, disk usage, and the cleaning backlog. Periodically audit per-topic configuration drift as well.

“Not compacting / too slow / not deleting” are classic complaints. The root cause is usually one of: (1) the segment hasn't closed, (2) the dirty ratio is too low, (3) the tombstone retention window hasn't elapsed, (4) keyless records are mixed in, or (5) the cleaner doesn't have enough throughput.

  • Check the configuration first: cleanup.policy, min.cleanable.dirty.ratio, segment.ms / segment.bytes, and delete.retention.ms.
  • Inspect broker logs and JMX metrics to monitor cleaner progress and errors.
  • High disk utilization and I/O contention will stall cleaning — tune the cleaner thread count and I/O bandwidth.
SymptomLikely causeDirection for fixing
Compaction is not progressingSegments haven't rotated, or dirty ratio is too lowRevisit segment.ms / segment.bytes and adjust thresholds
Deletion is not happeningStill within the delete.retention.ms windowUnderstand and adjust the retention window, or just wait
Disk pressureCompact-only policy with very high update frequencyCombine compact,delete; add cleaner threads; revisit compression settings

Tombstone lifetime timeline

Write key=K, value=null (tombstone)
  ↓ retained for delete.retention.ms
Compaction removes K's old values
  ↓ after the retention window
The tombstone itself is physically deleted

Inspecting and changing settings in operations

# Inspect current settings
kafka-configs.sh --bootstrap-server localhost:9092 \
  --entity-type topics --entity-name users-compact --describe

# Dynamically adjust (example): tune the dirty ratio
kafka-configs.sh --bootstrap-server localhost:9092 \
  --entity-type topics --entity-name users-compact \
  --alter --add-config min.cleanable.dirty.ratio=0.4

6. Implementation and Migration Strategies

When migrating an existing delete-based topic to compact or compact,delete, first verify two things: (1) can records be re-emitted with keys, and (2) is there a way to backfill the initial state? Without keys, compaction provides no benefit.

A blue/green switchover is the safe approach. Create a new topic with compact → backfill → dual-publish to both topics → migrate consumers → decommission the old topic. This sequence minimizes both downtime and rollback risk.

  • During migration, verify that you can rebuild the latest view simply by re-reading from earliest.
  • If you adopt compact,delete, design the retention.ms value at the same time.
  • Make sure the middle tier (Kafka Streams, Kafka Connect, etc.) reliably assigns keys before writing out.
Migration optionStrengthsWatch out for
Blue/green (run a new topic alongside)Safe, gradual migrationCost of dual-publishing during the overlap
Flip cleanup policy on the same topicSimpleExisting data isn't eligible for compaction, so the effect takes time to appear
Aggregate with Streams and write to a compacted topicImproves data quality at the same timeRequires application development

Blue/green migration flow

Producers → [old-topic (delete)] → Consumers (old)
          ↘︎ transform / backfill ↘︎
            [new-topic (compact)] → Consumers (new)
After the switchover, decommission old-topic in phases

A Streams topology that writes only the latest value (example)

StreamsBuilder b = new StreamsBuilder();
KStream<String, String> s = b.stream("src");
KTable<String, String> latest = s.groupByKey().reduce((agg, v) -> v);
latest.toStream().to("dst-compacted");
// Create dst-compacted with cleanup.policy=compact in advance

Check Your Understanding

CCDAK / CCAAK

問題 1

In a Kafka log-compacted topic, you want to reliably delete key K. Which approach is correct?

  1. Send a record with key K and a null value (a tombstone)
  2. Temporarily disable compaction and then set retention.ms to 0
  3. Re-send all of key K's old records to overwrite them
  4. Reset the consumer group so it reads only the latest record

正解: A

Deletion in log compaction is expressed by a record with a null value (a tombstone). The tombstone is retained for delete.retention.ms and then physically removed by cleaning along with the key's old values. None of the other options actually delete the key.

Frequently Asked Questions

Does log compaction break record ordering?

No. After compaction, the relative ordering of the remaining records is preserved within each partition. Log compaction only removes old duplicates; it never reorders records.

Can the latest value be lost when using compact,delete together?

Normally, no. The cleaner relocates only each key's latest value into a new segment, and old fully-cleaned segments are deleted on expiry. As a result, at least the latest value (or its tombstone) is retained.

From which offset should a consumer read to rebuild the latest view?

To initialize the latest view, reading from earliest is recommended. Log compaction only deletes old duplicates; rebuilding the latest values still requires applying every remaining record.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
Kafka

Kafka Topics & Partitions: Distribution Fundamentals (2026)

How Kafka topics and partitions enable scale — ordering guar...

Kafka

CCDAK Exam Guide: Confluent Certified Developer (2026)

Complete prep for the CCDAK exam — Producer/Consumer API, St...

Kafka

CCAAK Exam Guide: Confluent Certified Administrator (2026)

Pass the CCAAK exam — cluster management, partitions, securi...

Kafka

Kafka Replicas & ISR: Fault Tolerance Explained (2026)

Replica placement, in-sync replicas (ISR), leader election. ...

Kafka

Kafka Offsets: Commit Modes & Consumer Position (2026)

Offset semantics — auto vs. manual commit, __consumer_offset...

Browse all Kafka articles (101)
© 2026 NicheeLab All rights reserved.