The traditional Eager Rebalance causes all consumers to release their partitions at once, leading to brief but full stops that often spike latency and degrade throughput.
Cooperative Rebalance is the modern protocol that avoids stop-the-world by reassigning partitions incrementally and gradually. This article explains the mechanics, rollout steps, callback design, and operational tips from a perspective that also helps with the CCDAK exam.
With Eager Rebalance, any membership change or config update causes every consumer in the group to release all of its partitions and wait for reassignment. Even a short consumption pause ripples across the whole group, causing latency spikes and throughput drops.
In production, rolling deploys, scale-outs, and network jitter occur frequently, driving up the number of rebalances. Suppressing stop-the-world directly translates into SLO stability and peak-time resilience.
Typical configuration example for Eager Rebalance (traditional)
props.put("partition.assignment.strategy", java.util.List.of(
"org.apache.kafka.clients.consumer.RangeAssignor"
));
// デフォルトは RangeAssignor(Eager)で、全メンバーが一旦 revoke されるIncremental Cooperative Rebalancing moves only the minimum necessary partitions, step by step. Most existing members keep their assignments, and only the partitions that need to move are revoked and reassigned, which avoids stop-the-world.
On the consumer side, the ConsumerRebalanceListener callbacks let you safely commit and release only what was revoked, and start consuming what was assigned. You also need to account for cases where onPartitionsLost is called when partitions could not be revoked gracefully.
| Strategy | Behavior | Stop-the-world | Main callbacks |
|---|---|---|---|
| Eager (Range/RoundRobin/Sticky) | Every member releases all partitions at once | Likely to occur | onPartitionsRevoked/Assigned (Lost is not expected) |
| CooperativeSticky | Only the minimum necessary moves incrementally (step by step) | Largely avoidable | onPartitionsRevoked/Assigned/Lost |
| Sticky (Eager) | Movement is suppressed, but a full revoke still occurs | Mitigated but not avoidable | onPartitionsRevoked/Assigned |
Visualizing incremental movement (Cooperative Rebalance)
Enabling the Cooperative Sticky Assignor
props.put("partition.assignment.strategy", java.util.List.of(
"org.apache.kafka.clients.consumer.CooperativeStickyAssignor"
));
// 可能なら全メンバーで同一戦略に統一することWithin the same consumer group, the effective assignor is whatever sits at the head of the intersection of all members' strategies. Mixed periods can cause unexpected fallback to Eager behavior, so plan the migration carefully.
The safest approach is to first upgrade every member to a version that supports CooperativeStickyAssignor, and then either unify the strategy or migrate to a new group.id.
A safe migration example (gradual move using a new group.id)
# 1) 先にアプリを CooperativeStickyAssignor 対応版へデプロイ
# 2) 新しい group.id で起動し、旧グループから段階的にトラフィック移行
props.put("group.id", "orders-consumer-v2");
props.put("partition.assignment.strategy", java.util.List.of(
"org.apache.kafka.clients.consumer.CooperativeStickyAssignor"
));With Cooperative, you can cleanly close just the partitions being revoked. Inside onPartitionsRevoked, flush/commit the pending work for the target partitions and release external resources. In onPartitionsAssigned, keep the initialization for newly assigned partitions lightweight.
onPartitionsLost is called in error scenarios where partitions could not be returned gracefully. The safest move is to discard and reinitialize state on the assumption that unprocessed work will definitely be reprocessed.
Skeleton of a ConsumerRebalanceListener (Java)
consumer.subscribe(List.of("orders"), new ConsumerRebalanceListener() {
@Override
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
// revoke 対象だけを同期コミット・クローズ
commitOffsetsFor(partitions);
closeResourcesFor(partitions);
}
@Override
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
// 必要最小の初期化。重い処理は避ける
initStateFor(partitions);
}
@Override
public void onPartitionsLost(Collection<TopicPartition> partitions) {
// 異常系。未処理は再処理前提で状態を破棄
dropStateFor(partitions);
}
});Even when stop-the-world is avoided, rebalances still happen. Align timeouts and heartbeats with your application's processing time. In particular, max.poll.interval.ms must match the maximum time required to process a single poll.
Focus monitoring on consumer lag, the count and duration of rebalances, and whether onPartitionsLost fires. Use static membership (group.instance.id) and the cooperative strategy to minimize group churn.
Example monitoring and inspection commands
$ kafka-consumer-groups.sh \
--bootstrap-server broker:9092 \
--group orders-consumer \
--describe
# 再均衡の兆候はアプリログ(JoinGroup/SyncGroup)やメトリクスで確認CCDAK frequently tests the difference between Eager and Cooperative, the advantages of CooperativeStickyAssignor, the related callbacks (especially onPartitionsLost), and stabilization through static membership.
Locking in precise concepts — config precedence, behavior in mixed-strategy groups, safety measures during rolling deploys, and commit design principles (duplicate tolerance and ordering) — translates directly into points on the exam.
Frequently tested configuration keys (excerpt)
# コンシューマー設定(代表例)
partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor
max.poll.interval.ms=900000
session.timeout.ms=45000
heartbeat.interval.ms=15000
group.instance.id=orders-c1 # 静的メンバーの例(ユニークに)CCDAK
問題 1
You want to minimize stop-the-world when scaling out a consumer group. Which approach is most appropriate?
正解: A
CooperativeStickyAssignor avoids stop-the-world with incremental, gradual rebalancing. Combining it with static membership further reduces the number of rebalances. B is counterproductive, C adds operational burden without solving Eager's limits, and D is Eager and cannot avoid the full revoke.
Does Cooperative Rebalance completely eliminate duplicate processing?
No. It is a mechanism for avoiding stop-the-world, and duplicate processing is still possible. You need to combine it with duplicate-tolerant processing design (idempotency, accurate offset commits) to maintain consistency.
Does onPartitionsLost always fire?
Normally it does not fire. It is called when partitions could not be returned gracefully due to network disconnects or long pauses. The safe approach is to discard local state and reinitialize, assuming any unprocessed data will be reprocessed.
Can I gradually migrate to CooperativeSticky in a mixed environment?
If strategies are mixed within the same group, the assignor falls back to the head of the common intersection and can revert to Eager behavior. The safe approach is to upgrade all members to a supporting version first and then unify the strategy, or migrate to a new group.id.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...