Consumer Groups are at the heart of Kafka's scale-out and fault-tolerance story. Partitions cap the upper bound on parallelism, while group coordination (rebalancing) drives availability and latency.
This article distills the CCDAK exam perspective alongside the configuration and boundary behaviors that matter in production, sticking to stable concepts aligned with the official documentation.
A Consumer Group is made up of multiple consumers sharing the same group.id, and each partition of a given topic is assigned to exactly one consumer in the group. As a result, the maximum parallelism is bounded by the total number of assignable partitions across the group.
Ordering guarantees are per-partition. Even as you scale the group horizontally, ordering is preserved within a single partition but not across partitions. Secure the parallelism you need up front through partition design. Consumers are not thread-safe, so the basic rule is one thread per instance; if you need internal parallelism, decouple message processing with a worker-queue pattern.
Consumer Group and partition assignment (conceptual diagram)
Consumer Groups are managed by the Group Coordinator on a broker, and members send heartbeats to signal liveness. When sessions expire or membership changes, a rebalance fires and partition assignments are recomputed. Fetching and processing can pause briefly during a rebalance, so optimizing its frequency and duration is the key to throughput and latency.
The main triggers are members joining or leaving, changes to the subscribed topic set, partition count changes, and exceeding the session timeout or max poll interval. Configuration centers on keeping session.timeout.ms, heartbeat.interval.ms, and max.poll.interval.ms consistent, plus selecting the right assignment strategy.
| Aspect | Eager (legacy) rebalance | Cooperative (incremental) rebalance | Exam and operational takeaway |
|---|---|---|---|
| Disruption level | All members release every partition first, causing major processing interruptions | Only the affected partitions move incrementally, minimizing interruption | Cooperative is generally preferred for low-latency, stable operations |
| Movement volume | Broad reassignment tends to occur after recomputation | Stays at the minimum necessary reassignment | The difference is most pronounced during rolling restarts |
| Listener handling | Tends to require a full commit/cleanup in onPartitionsRevoked | Allows minimal commits in step with fine-grained revoke/assign events | The rebalance-listener implementation makes a real difference |
| Applicability and benefits | Simpler to implement; acceptable for small, short-lived groups | Effective for always-on workloads where stability and continuous processing matter | Setting it explicitly is the safe choice (defaults depend on implementation/version) |
Assignment strategies are specified as a list under partition.assignment.strategy. Common options include RangeAssignor, RoundRobinAssignor, StickyAssignor, and CooperativeStickyAssignor. In practice, incremental rebalancing via CooperativeStickyAssignor is widely recommended, but defaults vary across clients and distributions, so explicit configuration is the safer choice (to avoid version-driven surprises).
Static membership (setting group.instance.id) treats a restarted member as the same identity, dramatically reducing rebalancing during rolling restarts. Tune timeout settings with their interdependencies in mind so that you don't trigger unnecessary leaves by exceeding max.poll.interval.ms.
Consumer offsets are committed by default to the internal __consumer_offsets topic. enable.auto.commit will commit periodically, but it advances regardless of processing success or failure, so consider manual commits (commitSync/commitAsync) at least for critical processing paths.
Delivery semantics depend on commit ordering. Commit after processing for at-least-once. Commit before processing for at-most-once — but with the risk of loss. True exactly-once requires producer/consumer coordination via Kafka transactions, which broadens the unit of design.
To raise parallelism, you generally either add partitions or add consumers to the group. The latter is bounded by the former, so estimating the required partition count at planning time is critical. Too many partitions inflate metadata and replication overhead, so scale design needs to be grounded in evidence.
For fetch/processing optimization, useful tools include batch processing via max.poll.records, bandwidth efficiency via fetch.min.bytes and fetch.max.wait.ms, and temporary backpressure control via pause/resume. Distribute processing threads horizontally through an internal queue, and manage offset commits against processing completion.
In production, surface rebalance metrics (count and duration), consumer lag, fetch latency, and heartbeat delay. During incidents and deployments, combine static membership with the Cooperative strategy to minimize per-release impact.
During rolling restarts, stop and restart one instance at a time, waiting for each member to rejoin before moving on. With static membership intact, reassignment is localized and interruption stays minimal. The kafka-consumer-groups CLI with --describe or --reset-offsets is useful for inspecting and correcting state.
Java Consumer with Cooperative + static membership (key excerpts)
Properties props = new Properties();
props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker1:9092,broker2:9092");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName());
props.put(ConsumerConfig.GROUP_ID_CONFIG, "orders-consumer-g");
// Explicitly set Cooperative Sticky (list order is priority)
props.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
"org.apache.kafka.clients.consumer.CooperativeStickyAssignor");
// Static membership (set a unique ID per instance)
props.put(ConsumerConfig.GROUP_INSTANCE_ID_CONFIG, System.getenv("POD_NAME"));
// Align timeouts with processing time and network characteristics
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "15000");
props.put(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG, "3000");
props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, "600000");
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "false");
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
CountDownLatch latch = new CountDownLatch(1);
ConsumerRebalanceListener listener = new ConsumerRebalanceListener() {
@Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
// Reliably commit processed offsets (at-least-once)
try { consumer.commitSync(); } catch (Exception e) { /* log */ }
}
@Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
// Seek or initialize as needed
}
};
consumer.subscribe(Arrays.asList("orders"), listener);
try {
while (true) {
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(500));
for (ConsumerRecord<String, String> r : records) {
process(r); // App-specific processing (don't swallow exceptions; route to DLQ etc.)
}
consumer.commitAsync(); // Usually async; sync before shutdown/revoke
}
} finally {
try { consumer.commitSync(); } catch (Exception ignore) {}
consumer.close();
}CCDAK
問題 1
Topic A has 6 partitions. You launch 8 instances in Consumer Group G with CooperativeStickyAssignor and static membership configured. Which combination correctly describes the maximum concurrent parallelism and the rolling-restart behavior?
正解: A
The group's maximum parallelism is capped by the number of assignable partitions (6); the 2 surplus instances stay unassigned. Cooperative minimizes interruption via incremental reassignment, and static membership suppresses the full reassignment that would otherwise come from leave/rejoin during rolling restarts — but it does not eliminate rebalancing entirely.
Why doesn't throughput improve when I add more consumers?
Because the total number of partitions caps parallelism — any extra consumers stay unassigned. First revisit your partition-count design, or look at processing bottlenecks (downstream databases or APIs).
How can I increase parallelism while preserving ordering?
Improve key locality and increase the partition count. Ordering guarantees apply only within a single partition, so revisit your key/hashing design so that one key maps to one partition.
How can I reduce downtime caused by rebalances?
Explicitly set CooperativeStickyAssignor, enable static membership via group.instance.id, tune max.poll.interval.ms to match your processing time, set session/heartbeat appropriately, and implement onPartitionsRevoked/Assigned to keep commits and re-initialization minimal.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...