A stretched cluster distributes brokers and quorum nodes across multiple AZs within a single region to tolerate zone-level failures. Because of how Kafka's replication and quorum behave, you must understand RTT, ISR, and ACK settings.
This article covers the essential multi-AZ settings, common failure modes, and comparisons with alternative architectures. It also highlights keywords likely to appear on the CCAAK exam.
A stretched cluster places brokers and quorum nodes (ZooKeeper or KRaft controllers) across multiple AZs within a single region, aiming to keep both writes and reads available even if a single AZ is lost. The key levers are replication factor, min.insync.replicas, acks, unclean leader election, and preserving a quorum majority.
In practice, you should restrict this to a single region where inter-AZ RTT is on the order of a few milliseconds (roughly 1-2 ms). Stretching across regions adds latency, reduces throughput, lengthens re-election times, and increases timeouts. Multi-region scenarios should generally use asynchronous replication (Cluster Linking or MirrorMaker 2).
Quorum nodes must be deployed as an odd number to guarantee a majority. With ZooKeeper, use 3 or more nodes; with KRaft, use 3 or more controllers. Distribute them evenly across AZs. The bar is that even losing a single AZ must leave a majority intact.
| Architecture | Availability (zone failure) | RPO/RTO | Latency / Throughput |
|---|---|---|---|
| Single-AZ Kafka | Low (stops when AZ is lost) | RPO: unknown (halted), RTO: long | Best (low latency, high throughput) |
| Stretched cluster (intra-region, across AZs) | Medium to high (survives 1 AZ loss if majority holds) | RPO: 0 (assuming acks=all and minISR are satisfied), RTO: seconds to minutes | Medium (cross-AZ adds latency and reduces throughput) |
| Multi-cluster with async (Cluster Linking / MM2) | High (the other side keeps running when one is down) | RPO: >0 (asynchronous), RTO: short | Each cluster enjoys local low latency |
Kafka stretched cluster spanning multiple AZs (conceptual diagram)
Minimal configuration example: rack-aware settings and topic creation
# broker(AZごと)
# AZ-A のブローカ
broker.rack=az-a
num.network.threads=3
num.io.threads=8
# AZ-B のブローカ
broker.rack=az-b
# AZ-C のブローカ
broker.rack=az-c
# クラスタ共通の安全側設定
unclean.leader.election.enable=false
min.insync.replicas=2
# KRaftの場合:コントローラは3台(C1,C2,C3)を各AZに1台ずつ
# ZooKeeperの場合:ZKも3台を各AZに分散
# トピック作成:RF=3、minISR=2
kafka-topics.sh --create \
--topic orders \
--partitions 12 \
--replication-factor 3 \
--config min.insync.replicas=2
# プロデューサ(安全側)
acks=all
retries=2147483647
enable.idempotence=true
delivery.timeout.ms=120000
linger.ms=20
batch.size=131072
# コンシューマ(任意:Closest Replicaを使う場合)
# ブローカ側で replica.selector.class=org.apache.kafka.common.replica.RackAwareReplicaSelector
# クライアント側で client.rack=az-a(所属AZを設定)For a stretched cluster, RF=3, min.insync.replicas=2, and acks=all are the baseline. With these, losing one AZ still leaves 2 replicas in the ISR, so writes can continue with consistency after leader failover.
Always set unclean.leader.election.enable to false. Electing a stale, out-of-ISR replica as leader can cause data loss. Combine this with enable.idempotence=true on the producer to avoid duplicates and out-of-order writes.
Set broker.rack to each AZ name so that topic creation spreads replicas across AZs. If this breaks down, an AZ failure can take out several replicas at once, drop you below minISR, and stop writes.
Leader skew increases cross-AZ traffic and latency. Run Preferred Leader Election periodically to rebalance, and avoid producer hot-partitioning.
When an AZ is lost, the cluster keeps running as long as the quorum majority survives. If the lost AZ hosted many leaders, failover causes a temporary drop in throughput and a spike in latency.
After the AZ recovers, large amounts of re-sync traffic kick off. Apply a replication throttle as part of your runbook to soften the impact during business hours. Once fully recovered, rebalance with Preferred Leader Election as needed.
Cross-AZ traffic is a cost factor for both latency and cloud billing. Use compression (lz4 or zstd) and proper batching to cut round trips, and balance leader placement to avoid skewed inter-AZ traffic.
Storage is generally fine with JBOD, leaning on replication for durability. Balance segment size, page cache, and network bandwidth, and monitor so that re-sync windows during recovery do not blow out.
The exam frequently tests safe-side configuration values, rack awareness, quorum majority, and the boundary between multi-AZ and multi-region. Even when a question seems to have a single obvious answer, premises (RTT, RPO requirements, cost) are often implicit. Do not miss the assumptions in the question.
CCAAK
問題 1
In a 3-AZ stretched cluster within a single region, you want writes to continue and data loss to be avoided even when Zone A is completely lost. Which combination is appropriate?
正解: A
To target RPO=0 during a zone loss, the standard pattern is RF=3 with each replica on a different AZ, minISR=2 and acks=all to require replication to a majority before commit, and unclean leader election disabled. B and C are unsafe because they allow UNCLEAN or use an insufficient minISR. D uses RF=2, so losing one AZ leaves minISR unsatisfied and blocks writes.
Can a stretched cluster span multiple regions?
Not recommended. Inter-region RTT is too high, making leader election and replication unstable. Multi-region scenarios belong to asynchronous solutions like Cluster Linking or MirrorMaker 2.
How can consumers read from a local-AZ replica using Closest Replica?
Set replica.selector.class to RackAwareReplicaSelector on the broker, and set client.rack to the consumer's AZ on the client side. By default, consumers read from the leader only.
How many quorum nodes (ZooKeeper/KRaft) do I need?
Use an odd number (typically 3) spread across AZs. Place them so that losing a single AZ still leaves a majority.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...