Log compaction is a Kafka cleanup policy that asynchronously prunes older records sharing the same key, retaining at least the latest (last-written) value. It is ideal for use cases where you want the current state rather than the full history.
This article walks through how log compaction works, the guarantees it provides, the key configuration knobs, suitable use cases, and operational tips, while highlighting what the exams test. Special focus is given to tombstones (null values), the compact+delete combination, and how segments interact with the log cleaner.
When a topic has cleanup.policy=compact, log compaction retains at least one record — the last write — for each key within a partition. As a result, a consumer can rebuild a key→value map of the latest state simply by reading the topic from the beginning.
A record with a null value (known as a tombstone) signals deletion of that key. The tombstone itself is retained for delete.retention.ms and then physically removed by a subsequent cleaning pass. Log compaction is not immediate; it runs asynchronously in the background.
| Cleanup policy | Retention criteria | Primary use case |
|---|---|---|
| delete | Time/size (retention.ms / retention.bytes) | Event logs that need the full history |
| compact | Latest value per key (plus the recent uncompacted tail) | State synchronization, caches, reference data |
| compact,delete | Latest value plus time-based deletion of old segments | Keeping the latest value while tightly bounding disk usage |
Conceptual view of updates to keys A and B with compaction
Partition P0 (time →)
| A:1 | B:x | A:2 | A:3 | B:y | A:null | B:z |
^ older A entries are superseded by A:3
^ A:null is the delete marker for A (tombstone)
After compaction (conceptual): | ... | A:null | B:z |
Result: A is deleted, B keeps its latest value zProducing records with keys (console-producer)
kafka-console-producer \
--broker-list localhost:9092 \
--topic users-compact \
--property parse.key=true \
--property key.separator=:
# Example: userId:json
u1:{"name":"Ann","v":1}
u1:{"name":"Ann","v":2}
u2:{"name":"Bob","v":1}
# After compaction (some time later), only u1's v=2 remainsEach partition is split into multiple segments. The active (currently written) segment is excluded; only rotated (closed) segments become cleaning candidates. The cleaner indexes the latest offset per key and rewrites a new segment with the old duplicates dropped.
How “dirty” (how much duplication) is required before cleaning runs is controlled by min.cleanable.dirty.ratio and related thresholds. Cleaner thread count, I/O throughput, and compression codec are independent settings, and cleaning is not scheduled on a fixed time basis.
| Setting / concept | Scope | Purpose / key point |
|---|---|---|
| log.cleaner.enable | Broker | Enables the log cleaner (typically on by default) |
| log.cleaner.threads | Broker | Tunes the number of concurrent cleaning threads |
| min.cleanable.dirty.ratio (broker / topic) | Both | Sets how much duplication must accumulate before cleaning is triggered |
The segment cleaning flow
Before:
[Seg-1(closed)] [Seg-2(closed)] [Seg-3(active)]
A:1 B:1 A:2 B:2 A:3 ...
Cleaner →
- Pick the latest value per key (A:3, B:2)
- Relocate them into a freshly cleaned segment
After:
[Seg-1'(cleaned: A:3 B:2)] [Seg-3(active)]
Old Seg-1 / Seg-2 become deletion candidatesInspecting segment contents (kafka-dump-log.sh)
kafka-dump-log.sh --skip-record-metadata --print-data-log \
--files /var/lib/kafka/data/users-compact-0/00000000000000000000.log | head -n 50
# Compare the surviving records per key before and after compactionLog compaction is enabled per topic, and broker-level defaults can be overridden at the topic level. Pay particular attention to how cleanup.policy, delete.retention.ms, min.cleanable.dirty.ratio, segment.ms / segment.bytes, and min.compaction.lag.ms / max.compaction.lag.ms interact.
Using compact,delete together lets you retain the latest value while time-deleting older, fully-cleaned segments. Tombstones still live for delete.retention.ms, so deletion is never immediate. Exams frequently test three points: keys are required, what a tombstone means, and that cleaning is asynchronous.
| Parameter | Exam angle | Watch out for |
|---|---|---|
| cleanup.policy | Difference between compact and delete | When combined, both latest-value retention and time-based deletion apply |
| delete.retention.ms | Tombstone lifetime | Too short risks unrecoverable state; too long pressures disk space |
| min.compaction.lag.ms/max.compaction.lag.ms | Compaction timing | Understand the impact of both too-early and too-late compaction |
How retention and compaction are applied in sequence
Write → segment rotation → (threshold reached) compaction →
[compact only] latest value is retained
[compact,delete] additionally deletes segments that are both fully cleaned and past their retentionEnabling compaction at topic creation time
kafka-topics.sh --create \
--bootstrap-server localhost:9092 \
--topic users-compact \
--partitions 6 \
--replication-factor 3 \
--config cleanup.policy=compact \
--config min.cleanable.dirty.ratio=0.5 \
--config delete.retention.ms=86400000 \
--config segment.ms=3600000Typical use cases include CDC update streams (e.g. a customer's latest address, current inventory levels), cache warm-up and refresh, and Kafka Streams state-store changelogs. In all of these, the latest state is what matters, not every intermediate update.
Anti-patterns include audit logs that require a full history, event sourcing that needs to replay every event in order, and data with no keys (or essentially unique keys that are never updated). These cases call for delete-based retention or a different storage system.
| Scenario | Recommended topic policy | Watch out for |
|---|---|---|
| Latest customer profile state | compact | Key design is mandatory (same customer → same key) |
| Inventory level sync | compact,delete | Caps disk usage while keeping the latest value |
| Audit and trace logs | delete | Full history is required, so compaction is inappropriate |
From CDC to a rebuilt latest-state view
DB → CDC (updates/deletes) → [compacted topic] →
→ cache / view (latest value per key only)
Delete events flow as tombstones and are physically removed after their retention windowSending a tombstone (Java Producer)
Properties p = new Properties();
p.put("bootstrap.servers", "localhost:9092");
p.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
p.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
try (KafkaProducer<String, String> producer = new KafkaProducer<>(p)) {
// Normal latest value
producer.send(new ProducerRecord<>("users-compact", "u1", "{\"name\":\"Ann\"}"));
// Delete (tombstone): pass null as the value
producer.send(new ProducerRecord<>("users-compact", "u1", null));
producer.flush();
}Compaction is asynchronous and sensitive to load and threshold settings. Track log cleaner thread activity, cleanable bytes, disk usage, and the cleaning backlog. Periodically audit per-topic configuration drift as well.
“Not compacting / too slow / not deleting” are classic complaints. The root cause is usually one of: (1) the segment hasn't closed, (2) the dirty ratio is too low, (3) the tombstone retention window hasn't elapsed, (4) keyless records are mixed in, or (5) the cleaner doesn't have enough throughput.
| Symptom | Likely cause | Direction for fixing |
|---|---|---|
| Compaction is not progressing | Segments haven't rotated, or dirty ratio is too low | Revisit segment.ms / segment.bytes and adjust thresholds |
| Deletion is not happening | Still within the delete.retention.ms window | Understand and adjust the retention window, or just wait |
| Disk pressure | Compact-only policy with very high update frequency | Combine compact,delete; add cleaner threads; revisit compression settings |
Tombstone lifetime timeline
Write key=K, value=null (tombstone)
↓ retained for delete.retention.ms
Compaction removes K's old values
↓ after the retention window
The tombstone itself is physically deletedInspecting and changing settings in operations
# Inspect current settings
kafka-configs.sh --bootstrap-server localhost:9092 \
--entity-type topics --entity-name users-compact --describe
# Dynamically adjust (example): tune the dirty ratio
kafka-configs.sh --bootstrap-server localhost:9092 \
--entity-type topics --entity-name users-compact \
--alter --add-config min.cleanable.dirty.ratio=0.4When migrating an existing delete-based topic to compact or compact,delete, first verify two things: (1) can records be re-emitted with keys, and (2) is there a way to backfill the initial state? Without keys, compaction provides no benefit.
A blue/green switchover is the safe approach. Create a new topic with compact → backfill → dual-publish to both topics → migrate consumers → decommission the old topic. This sequence minimizes both downtime and rollback risk.
| Migration option | Strengths | Watch out for |
|---|---|---|
| Blue/green (run a new topic alongside) | Safe, gradual migration | Cost of dual-publishing during the overlap |
| Flip cleanup policy on the same topic | Simple | Existing data isn't eligible for compaction, so the effect takes time to appear |
| Aggregate with Streams and write to a compacted topic | Improves data quality at the same time | Requires application development |
Blue/green migration flow
Producers → [old-topic (delete)] → Consumers (old)
↘︎ transform / backfill ↘︎
[new-topic (compact)] → Consumers (new)
After the switchover, decommission old-topic in phasesA Streams topology that writes only the latest value (example)
StreamsBuilder b = new StreamsBuilder();
KStream<String, String> s = b.stream("src");
KTable<String, String> latest = s.groupByKey().reduce((agg, v) -> v);
latest.toStream().to("dst-compacted");
// Create dst-compacted with cleanup.policy=compact in advanceCCDAK / CCAAK
問題 1
In a Kafka log-compacted topic, you want to reliably delete key K. Which approach is correct?
正解: A
Deletion in log compaction is expressed by a record with a null value (a tombstone). The tombstone is retained for delete.retention.ms and then physically removed by cleaning along with the key's old values. None of the other options actually delete the key.
Does log compaction break record ordering?
No. After compaction, the relative ordering of the remaining records is preserved within each partition. Log compaction only removes old duplicates; it never reorders records.
Can the latest value be lost when using compact,delete together?
Normally, no. The cleaner relocates only each key's latest value into a new segment, and old fully-cleaned segments are deleted on expiry. As a result, at least the latest value (or its tombstone) is retained.
From which offset should a consumer read to rebuild the latest view?
To initialize the latest view, reading from earliest is recommended. Log compaction only deletes old duplicates; rebuilding the latest values still requires applying every remaining record.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...