A Kafka "offset" is not just a number — it represents a consumer group's "next position to read." When you commit determines whether you risk loss or duplication.
This article follows the CCDAK exam focus areas (offset semantics, commit methods, delivery semantics, and rebalance impact) and is grounded in stable, officially documented behavior.
Kafka offsets are sequential numbers per partition, and a consumer group stores the "next offset to read" for each partition in __consumer_offsets (an internal, compacted topic). Critically, what's recorded is "the offset after the last processed record."
On startup, the consumer fetches the latest commit from the group coordinator and resumes reading from that position. If no commit exists, the starting position is governed by auto.offset.reset (earliest or latest). Distinguish between the fetch position (next to be retrieved) and the committed position (the restart point).
A visual map of offsets and commit positions
Kafka consumer commits can be automatic (enable.auto.commit) or manual (commitSync/commitAsync). The choice changes the likelihood of loss and duplication, as well as throughput and latency.
The fundamental rule for at-least-once is "commit only after the side effect (e.g., the external write) completes." Committing before processing yields at-most-once (with possible loss), and the longer you delay commits, the more duplicate reprocessing you accept on failure.
| Strategy | Typical config / API | Loss / duplication risk | Latency / overhead |
|---|---|---|---|
| Auto commit | enable.auto.commit=true, auto.commit.interval.ms | Failure after processing → possible loss. Duplication depends on timing | Low (simple, but coarse control) |
| Manual sync commit | enable.auto.commit=false, commitSync() | Minimizes loss. Duplicates on failure-restart | Higher (waits for ack) |
| Manual async commit | enable.auto.commit=false, commitAsync() | Async commits can be lost; duplicates possible | Lower (throughput-focused) |
| Batched sync commit | commitSync() every N records or interval | Duplicates are batch-bounded. Loss minimized | Moderate (balanced) |
| Transactional integration | Producer sendOffsetsToTransaction + read_committed | Atomically ties read-write-commit together. Suppresses duplicates and loss | Higher (more complex design) |
The basic loop is "poll → process (durable write to the external system) → commit." Treat batches as units and tune so heavy processing doesn't exceed max.poll.interval.ms. Apply backpressure with pause/resume when needed.
The more idempotent the downstream system (same-key upserts, dedup), the more you absorb the duplicate impact of at-least-once. If you truly need exactly-once, consider transactions (producer EOS with read_committed).
Java consumer (manual commits for at-least-once)
// Required library: org.apache.kafka:kafka-clients
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker:9092");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.GROUP_ID_CONFIG, "etl-payments-g1");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "500");
try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props)) {
consumer.subscribe(Collections.singletonList("payments"));
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
// Preserve per-partition ordering while processing
for (TopicPartition tp : records.partitions()) {
List<ConsumerRecord<String, String>> tpRecords = records.records(tp);
for (ConsumerRecord<String, String> r : tpRecords) {
// 1) Durable write to the external system (idempotent / replayable)
writeToDbIdempotent(r.key(), r.value());
}
// 2) Sync-commit lastOffset+1 for each partition
long lastOffset = tpRecords.get(tpRecords.size() - 1).offset();
Map<TopicPartition, OffsetAndMetadata> commit = new HashMap<>();
commit.put(tp, new OffsetAndMetadata(lastOffset + 1));
try {
consumer.commitSync(commit);
} catch (CommitFailedException e) {
// E.g., just after a rebalance. Reprocess on the next loop (at-least-once)
log.warn("commit failed, will retry by reprocessing", e);
}
}
}
}
// Tip: under overload, combine consumer.pause(assignments) / resume to avoid exceeding max.poll.interval.msRebalances occur on member join/leave or subscription changes. If processed offsets aren't committed before partitions are revoked, the inheriting consumer reprocesses the gap as duplicates (acceptable under at-least-once, but the volume of duplicates grows).
The standard pattern is to register a ConsumerRebalanceListener and commit immediately in onPartitionsRevoked. The Cooperative Sticky assignor still requires the same care on each incremental revoke. Idle polls or over-long processing that exceed max.poll.interval.ms tend to force rebalances, so use pause/resume and batch tuning to avoid them.
When no commit exists, the start position is governed by auto.offset.reset=earliest/latest. If the position goes out of range (e.g., the offset expired past the retention window), explicitly seek with seekToBeginning/seekToEnd or reset the group's offsets with the admin tooling.
When you need to reprocess, seek a specific partition back to an earlier offset after assignment, or read "as a brand new consumer group" under a different group.id. In either case, assume duplicates under at-least-once and keep downstream writes idempotent for safety.
What's committed is the "next position to read," not the offset of the last processed record itself. With that in mind, you should be able to explain the restart position and duplication patterns under failure.
Committing offsets across multiple partitions in one shot does not atomically bind to writes against an external system. If a failure lands between the external side effect and the commit, duplicates can occur. To target exactly-once, choose a design that combines transactions (sendOffsetsToTransaction with read_committed).
CCDAK
問題 1
A Kafka consumer performs durable writes to an external database and must preserve at-least-once. Which approach is most appropriate?
正解: B
Synchronous commit after processing (the durable external write) completes is the foundation of at-least-once, so B is correct. A and C commit before processing and risk loss; D doesn't fundamentally prevent duplicates.
Is the committed value the "last processed offset" or the "next offset to read"?
It's the "next offset to read." For example, if the last processed offset was 4, you commit 5. On restart, the consumer resumes from 5 and offsets 0-4 are not reprocessed.
How can I completely eliminate duplicates?
Manual commits on the consumer side alone cannot prevent duplicates. To deduplicate, make the downstream system idempotent (upserts on the same key, unique constraints) or combine Kafka transactions (producer EOS plus sendOffsetsToTransaction, with the consumer using read_committed).
How can I rewind offsets to reprocess historical data?
After assignment, seek a specific partition back to an earlier offset, or use the admin tool (kafka-consumer-groups --reset-offsets) to adjust the group's offsets. For broad reprocessing, using a fresh group.id is also a common approach.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...