Kafka Offsets: Commit Modes & Consumer Position (2026)

A Kafka "offset" is not just a number — it represents a consumer group's "next position to read." When you commit determines whether you risk loss or duplication.

This article follows the CCDAK exam focus areas (offset semantics, commit methods, delivery semantics, and rebalance impact) and is grounded in stable, officially documented behavior.

Offset and Position Basics

Kafka offsets are sequential numbers per partition, and a consumer group stores the "next offset to read" for each partition in __consumer_offsets (an internal, compacted topic). Critically, what's recorded is "the offset after the last processed record."

On startup, the consumer fetches the latest commit from the group coordinator and resumes reading from that position. If no commit exists, the starting position is governed by auto.offset.reset (earliest or latest). Distinguish between the fetch position (next to be retrieved) and the committed position (the restart point).

Offsets are tracked per partition, not per topic
Commits store the "next to read" position (e.g., last processed = 4 → commit = 5)
On startup and rebalance, the consumer resumes from the committed position
__consumer_offsets is kept mostly to the latest entries via log compaction

A visual map of offsets and commit positions

Comparing Commit Strategies and How to Choose

Kafka consumer commits can be automatic (enable.auto.commit) or manual (commitSync/commitAsync). The choice changes the likelihood of loss and duplication, as well as throughput and latency.

The fundamental rule for at-least-once is "commit only after the side effect (e.g., the external write) completes." Committing before processing yields at-most-once (with possible loss), and the longer you delay commits, the more duplicate reprocessing you accept on failure.

Commit after processing = prevents loss (accepts duplicates)
Commit before processing = low latency but risks loss (a wrong-answer pattern on the exam)
Commit per batch to balance overhead and reprocessing volume

Strategy	Typical config / API	Loss / duplication risk	Latency / overhead
Auto commit	enable.auto.commit=true, auto.commit.interval.ms	Failure after processing → possible loss. Duplication depends on timing	Low (simple, but coarse control)
Manual sync commit	enable.auto.commit=false, commitSync()	Minimizes loss. Duplicates on failure-restart	Higher (waits for ack)
Manual async commit	enable.auto.commit=false, commitAsync()	Async commits can be lost; duplicates possible	Lower (throughput-focused)
Batched sync commit	commitSync() every N records or interval	Duplicates are batch-bounded. Loss minimized	Moderate (balanced)
Transactional integration	Producer sendOffsetsToTransaction + read_committed	Atomically ties read-write-commit together. Suppresses duplicates and loss	Higher (more complex design)

Implementation Patterns for at-least-once

The basic loop is "poll → process (durable write to the external system) → commit." Treat batches as units and tune so heavy processing doesn't exceed max.poll.interval.ms. Apply backpressure with pause/resume when needed.

The more idempotent the downstream system (same-key upserts, dedup), the more you absorb the duplicate impact of at-least-once. If you truly need exactly-once, consider transactions (producer EOS with read_committed).

Use enable.auto.commit=false and commit explicitly
commitSync after processing succeeds; mix in commitAsync when throughput demands it
On exception, leave uncommitted to reprocess on the next loop (route to a DLQ if needed)
Tune the balance between batch size (max.poll.records) and processing time

Java consumer (manual commits for at-least-once)

// Required library: org.apache.kafka:kafka-clients
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker:9092");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.GROUP_ID_CONFIG, "etl-payments-g1");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "500");

try (KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props)) {
    consumer.subscribe(Collections.singletonList("payments"));
    while (true) {
        ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
        // Preserve per-partition ordering while processing
        for (TopicPartition tp : records.partitions()) {
            List<ConsumerRecord<String, String>> tpRecords = records.records(tp);
            for (ConsumerRecord<String, String> r : tpRecords) {
                // 1) Durable write to the external system (idempotent / replayable)
                writeToDbIdempotent(r.key(), r.value());
            }
            // 2) Sync-commit lastOffset+1 for each partition
            long lastOffset = tpRecords.get(tpRecords.size() - 1).offset();
            Map<TopicPartition, OffsetAndMetadata> commit = new HashMap<>();
            commit.put(tp, new OffsetAndMetadata(lastOffset + 1));
            try {
                consumer.commitSync(commit);
            } catch (CommitFailedException e) {
                // E.g., just after a rebalance. Reprocess on the next loop (at-least-once)
                log.warn("commit failed, will retry by reprocessing", e);
            }
        }
    }
}

// Tip: under overload, combine consumer.pause(assignments) / resume to avoid exceeding max.poll.interval.ms

Rebalance Pitfalls and How to Address Them

Rebalances occur on member join/leave or subscription changes. If processed offsets aren't committed before partitions are revoked, the inheriting consumer reprocesses the gap as duplicates (acceptable under at-least-once, but the volume of duplicates grows).

The standard pattern is to register a ConsumerRebalanceListener and commit immediately in onPartitionsRevoked. The Cooperative Sticky assignor still requires the same care on each incremental revoke. Idle polls or over-long processing that exceed max.poll.interval.ms tend to force rebalances, so use pause/resume and batch tuning to avoid them.

Commit the latest fully-processed position in onPartitionsRevoked
For long processing, pause → write externally → resume
session.timeout.ms and heartbeat.interval.ms affect group stability
Commit too often → more overhead; commit too rarely → more duplicate reprocessing

Offset Reset and Reprocessing Fundamentals

When no commit exists, the start position is governed by auto.offset.reset=earliest/latest. If the position goes out of range (e.g., the offset expired past the retention window), explicitly seek with seekToBeginning/seekToEnd or reset the group's offsets with the admin tooling.

When you need to reprocess, seek a specific partition back to an earlier offset after assignment, or read "as a brand new consumer group" under a different group.id. In either case, assume duplicates under at-least-once and keep downstream writes idempotent for safety.

Control temporary rewinds with assign/seek (after assignment)
For broad redo, consider kafka-consumer-groups --reset-offsets
auto.offset.reset applies only when no commit exists
Limit the affected partitions to localize the impact of reprocessing

CCDAK Exam Tips and Common Misconceptions

What's committed is the "next position to read," not the offset of the last processed record itself. With that in mind, you should be able to explain the restart position and duplication patterns under failure.

Committing offsets across multiple partitions in one shot does not atomically bind to writes against an external system. If a failure lands between the external side effect and the commit, duplicates can occur. To target exactly-once, choose a design that combines transactions (sendOffsetsToTransaction with read_committed).

at-most-once = commit before processing, risk of loss
at-least-once = commit after processing, duplicates accepted
exactly-once = designed around EOS (transactions) and idempotency
Exceeding max.poll.interval.ms → rebalance → more duplicates from uncommitted work
__consumer_offsets is an internal topic (don't update it directly)

Check Your Understanding

CCDAK

問題 1

A Kafka consumer performs durable writes to an external database and must preserve at-least-once. Which approach is most appropriate?

Set enable.auto.commit=true and shorten auto.commit.interval.ms
Set enable.auto.commit=false and call commitSync() after processing completes
Call commitAsync() before processing to minimize latency, then write
Setting max.poll.records=1 prevents duplicates

正解: B

Synchronous commit after processing (the durable external write) completes is the foundation of at-least-once, so B is correct. A and C commit before processing and risk loss; D doesn't fundamentally prevent duplicates.

Frequently Asked Questions

Is the committed value the "last processed offset" or the "next offset to read"?

It's the "next offset to read." For example, if the last processed offset was 4, you commit 5. On restart, the consumer resumes from 5 and offsets 0-4 are not reprocessed.

How can I completely eliminate duplicates?

Manual commits on the consumer side alone cannot prevent duplicates. To deduplicate, make the downstream system idempotent (upserts on the same key, unique constraints) or combine Kafka transactions (producer EOS plus sendOffsetsToTransaction, with the consumer using read_committed).

How can I rewind offsets to reprocess historical data?

After assignment, seek a specific partition back to an earlier offset, or use the admin tool (kafka-consumer-groups --reset-offsets) to adjust the group's offsets. For broad reprocessing, using a fresh group.id is also a common approach.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Kafka Offsets and Commits: Position Management and at-least-once Fundamentals