With the right plan, Kafka upgrades can be performed with zero downtime. The keys are understanding compatibility and following a phased rolling update procedure. This article distills the official documentation into a workflow that is useful both for CCAAK exam prep and for production operations.
We turn the following into concrete steps and checklists: pinning inter.broker.protocol.version (IBPV), pinning the message format when needed, the correct order for controllers and clients, the differences between KRaft and ZooKeeper, and the essentials of validation and rollback.
Kafka compatibility breaks down into three layers: the wire protocol (broker-to-broker and client-to-broker), the record message format, and the metadata feature level (for KRaft). In a rolling update, you first pin IBPV and the message format to the current versions so that existing communication and record formats are not broken, then upgrade the binaries.
Clients can generally talk to newer brokers using older API versions, so upgrading brokers first is the rule. In ZooKeeper mode the controller role is embedded in the brokers, but in KRaft mode you must roll the controller quorum and the broker fleet separately, in that order.
The flow below is for ZooKeeper mode. KRaft is fundamentally the same, but adds an ordering requirement (controllers first, then brokers) and a final feature-level finalization step.
What matters is the order: pin compatibility first, upgrade one broker at a time while checking health, and only unpin (move to the latest) once every broker is upgraded.
Conceptual diagram of a rolling update (B1 → B2 → B3, one broker at a time)
Example runbook (assumes ZooKeeper mode; see the KRaft notes below)
# 0) Pre-check (cluster health)
$ kafka-topics.sh --bootstrap-server broker:9092 --describe --under-replicated-partitions
$ kafka-topics.sh --bootstrap-server broker:9092 --describe --unavailable-partitions
$ kafka-broker-api-versions.sh --bootstrap-server broker:9092 | head
# 1) Pin compatibility (apply the same server.properties on every broker)
# Example: if the current production version is 2.8
inter.broker.protocol.version=2.8
log.message.format.version=2.8 # only if the property exists; skip otherwise
# 2) Upgrade brokers one at a time
# (Optional) reduce leadership before stopping
$ kafka-preferred-replica-election.sh --zookeeper zk:2181 # if available
# Stop -> upgrade package -> start broker B1
$ sudo systemctl stop kafka
# Run the package upgrade in the way your distro/artifact requires
$ sudo tar -xf kafka_2.13-3.6.0.tgz -C /opt/kafka --strip-components=1
$ sudo systemctl start kafka
# Health check (URP=0, no offline partitions, no client errors)
$ kafka-topics.sh --bootstrap-server broker:9092 --describe --under-replicated-partitions
# Repeat for B2, B3 ...
# 3) After all brokers are upgraded, unpin (move to the latest)
inter.broker.protocol.version=3.6 # example
log.message.format.version=3.6 # only if the property exists
# 4) Final rolling restart to apply the new config
$ sudo systemctl restart kafka # one broker at a time
# 5) For KRaft, finalize on a controller (actual options depend on your version)
$ kafka-features.sh --bootstrap-server controller:9093 --describe
$ kafka-features.sh --bootstrap-server controller:9093 --finalize-upgradeinter.broker.protocol.version pins the inter-broker wire protocol to the existing version. This lets old and new brokers communicate using the same protocol during a rolling update, preserving compatibility. Apply the pin uniformly on every broker, and only bump it after every broker has been upgraded.
When log.message.format.version is available, it pins the record encoding (magic byte, header layout, etc.) to the old format. Unpinning it does not rewrite existing log segments; only newly written records use the new format. As a result, the switch typically does not trigger a large I/O burst.
Quick commands for checking your settings
# Are IBPV / message format what you expect? (static config file example)
$ grep -E "^(inter.broker.protocol.version|log.message.format.version)" /etc/kafka/server.properties
# API version mapping (client vs broker)
$ kafka-broker-api-versions.sh --bootstrap-server broker:9092 | sed -n '1,20p'
# Cluster health (representative metrics)
$ kafka-metrics.sh # match this to whatever metrics pipeline you use, and watch indicators like:
# kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions == 0
# kafka.controller:type=KafkaController,name=ActiveControllerCount == 1 (ZK)
# kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec staying nominalAfter restarting each broker, confirm that partition replication has caught up, leadership is stable, and clients are not seeing rising timeouts. In practice, running smoke tests that mirror real production traffic is more useful than synthetic benchmarks.
If something goes wrong, the safe move is to roll back just that broker's binary and leave the pinned settings (IBPV / message format) untouched. As long as the pins stay in place, rolling back remains easy until every broker has been upgraded.
Rolling updates are the default, but depending on requirements you may opt for a stop-the-world upgrade or a parallel cluster (Blue/Green). Weigh audit obligations, the risk of large version skips, and infrastructure cost when choosing.
From a CCAAK perspective, you need to be able to clearly articulate the prerequisites of a rolling update: pin IBPV, unpin only after every broker is upgraded, and update clients last.
| Strategy | Downtime | Risk / Complexity | Extra Cost |
|---|---|---|---|
| Rolling update (recommended) | Near zero (only momentary leader re-election) | Medium (requires ordering and pin management) | Low |
| Stop-the-world upgrade (full shutdown -> upgrade all) | High (full outage) | Low (work is simple) | Low |
| Parallel cluster (Blue/Green) | Zero (minimal blip at cutover) | High (requires dual-ingest and consistency checks) | High |
The exam frequently tests the pin -> upgrade -> unpin order, the client upgrade order, how the message format is handled, and the existence of KRaft feature finalization. Distractors such as "bump IBPV before every broker is upgraded" or "upgrade clients first" are wrong.
In production, the stable approach is to spell out observability checkpoints before and after each change, and never advance to the next broker until the current one meets the pass criteria.
CCAAK
問題 1
You are performing a zero-downtime upgrade from Kafka 2.8 to 3.x in ZooKeeper mode. Which procedure is the safest?
正解: A
In a rolling update, you preserve compatibility by first pinning IBPV (and message format if applicable) to the current value, then unpinning only after every broker has been upgraded. Clients are upgraded last by convention.
Is log.message.format.version always required?
Depending on the environment and version, the property may not exist. Where it does, pin it to the current version before upgrading to preserve compatibility during the rolling update, then bump it to the latest version after all brokers are upgraded. Existing log segments are not rewritten when you switch.
What is additionally required in KRaft mode?
Roll the controller quorum first, then the brokers, and finally finalize the metadata feature level (using tools like kafka-features). The old feature level stays in effect until finalization, so wait until after finalization to use new features.
When should clients be upgraded?
As a rule, only after the broker upgrade is complete and IBPV / message format (and the KRaft feature level) have been finalized. Most clients are backward compatible, so upgrading the server side first is the safe path.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...