In production Kafka, setting a topic's Replication Factor (RF) to 3 is the standard practice. It tolerates a single broker failure, limits data-loss risk, and strikes a sensible balance between availability and cost.
However, RF=3 alone is not enough. You also need acks=all aligned with min.insync.replicas, unclean.leader.election disabled, rack-aware placement, and throttling during reassignment — all designed together.
RF determines the number of replicas per partition and directly drives both fault tolerance and cost. RF=3 assumes one broker failure and offers a good balance of write continuity and data protection. Pair it with acks=all and min.insync.replicas=2 to tolerate one replica being down while preventing loss of committed data.
RF=2 looks cost-efficient at first glance, but with min.insync.replicas=2 a single failure immediately stops writes, and with =1 writes continue but data-loss risk rises. RF=1 is not recommended in production.
Example: explicitly set RF=3 at topic creation time
kafka-topics.sh --bootstrap-server localhost:9092 \
--create --topic orders \
--partitions 12 --replication-factor 3
# 既定値の整備(ブローカー設定例)
# server.properties
# default.replication.factor=3
# min.insync.replicas はトピック/ブローカーのいずれでも設定可acks=all requires commit acknowledgment from the leader and the entire ISR (in-sync replicas). min.insync.replicas (minISR) is the lower bound on ISR size needed to accept a write. For RF=3, we recommend minISR=2: writes continue as long as ISR stays at 2 or above, even when one replica is down.
Setting minISR equal to RF means a single failure stops writes immediately. Setting minISR too low keeps writes flowing but increases data-loss risk during failures. acks=1 has low latency but acknowledges success before followers replicate, so it should be avoided in production.
Example producer configuration (Java/Properties)
bootstrap.servers=broker-1:9092,broker-2:9092,broker-3:9092
acks=all
retries=2147483647
max.in.flight.requests.per.connection=1
enable.idempotence=true
# レイテンシと耐障害性のバランスを見て batch.size / linger.ms を調整With RF=3 and minISR=2, acks=all writes continue during a single broker failure as long as ISR stays at 2 or above. Using rack awareness to spread replicas across AZs or racks improves resilience to a single AZ failure.
For data integrity, set unclean.leader.election.enable=false. That prevents non-ISR replicas from being promoted to leader, prioritizing consistency over availability. Setting it to true may restore writes faster, but committed offsets can roll back.
Example of protective configuration per topic
kafka-configs.sh --bootstrap-server localhost:9092 \
--alter --topic orders \
--add-config min.insync.replicas=2,unclean.leader.election.enable=false
# 確認
echo "describe configs"
kafka-configs.sh --bootstrap-server localhost:9092 \
--describe --topic ordersHigher RF increases storage linearly and replication traffic by roughly (RF-1)×. If production traffic is 50 MB/s with RF=3, replication adds about 100 MB/s of additional inter-broker ingress. Design with network and disk headroom in mind.
RF=5 is sometimes adopted for mission-critical workloads, but the cost rises sharply, so reassessing RF=3 against your SLOs and failure assumptions (AZ/rack-level) is the practical baseline.
| RF and minISR | Tolerable failures (writes continue) | Expected data-loss risk | Storage multiplier |
|---|---|---|---|
| RF=1 (minISR=1) | 0 | High | 1x |
| RF=2 (minISR=2) | 0 | Medium (with unclean disabled) | 2x |
| RF=2 (minISR=1) | 1 | High (progresses with inconsistency) | 2x |
| RF=3 (minISR=2) | 1 | Low | 3x |
| RF=5 (minISR=3) | 2 | Low | 5x |
Rough back-of-envelope calculation (shell)
# 入力 50 MB/s, RF=3 の場合
IN_MBPS=50
RF=3
REPL_MBPS=$(( IN_MBPS * (RF-1) ))
echo "Replicate ingress ~ ${REPL_MBPS} MB/s"
# 日次 500 GB のトピック、RF=3 のストレージ概算(圧縮オフ時)
DAILY_GB=500
STORAGE_GB=$(( DAILY_GB * RF ))
echo "Storage per day ~ ${STORAGE_GB} GB (+index/overhead)"Once broker.rack is set, Kafka's partition assignment spreads replicas across different racks/AZs as much as possible. With RF=3, the ideal layout places one replica in each of three AZs. Design to keep minISR satisfied even during an AZ failure.
Avoid manual replica placement; configure broker rack metadata correctly and let topic creation handle assignment. For existing topics, use the reassignment tool to relocate in a planned, controlled way.
Replica spread in a 3-AZ layout (example for P0)
Basic rack-awareness configuration
# 各ブローカーの server.properties
broker.rack=az-a # ブローカーごとに az-a / az-b / az-c を設定
# トピック作成(自動割り当てを利用)
kafka-topics.sh --bootstrap-server localhost:9092 \
--create --topic payments --partitions 24 --replication-factor 3Adding or decommissioning brokers, or changing RF, triggers partition reassignment. Throttle the re-replication bandwidth to limit the impact on normal traffic. For long-running operations, consider a maintenance window.
Leader skew degrades latency, so run preferred leader election when needed to rebalance. Keep unclean leader election disabled and prioritize data preservation over availability.
Common operational commands
# パーティション数の増加(RF は維持)
kafka-topics.sh --bootstrap-server localhost:9092 \
--alter --topic orders --partitions 36
# 再割り当てプラン生成と適用(例)
# 1) JSON を用意(対象トピック/ブローカー)
# 2) ツールで --execute、--throttle を指定
kafka-reassign-partitions.sh --bootstrap-server localhost:9092 \
--reassignment-json-file plan.json --execute --throttle 104857600
# Preferred leader election(偏り是正)
kafka-leader-election.sh --bootstrap-server localhost:9092 \
--election-type PREFERRED --topic ordersMemorize the standard pattern: RF=3 plus acks=all + minISR=2, unclean.leader.election=false, and broker.rack — a four-piece set. Practice explaining it against failure scenarios.
RF=2 pitfalls: with minISR=1, a single failure keeps writes flowing but raises data-loss risk; with minISR=2, a single failure halts writes. Both show up frequently as exam options.
Higher RF raises storage and bandwidth costs linearly, so adopting RF=5 hinges on whether your SLOs and AZ/region requirements justify it.
Cheat sheet (with comments)
# 推奨の基本線(本番)
# RF=3, acks=all, min.insync.replicas=2, enable.idempotence=true
# unclean.leader.election.enable=false, broker.rack=<AZ/Rack>
# rack-aware による 3 AZ 分散、単一障害で書き込み継続
# コスト: ストレージ 3x, レプリケーション帯域 2xCCAAK
問題 1
You want to maximize both write availability and data preservation in a production Kafka cluster during a single broker failure. The cluster has brokers evenly placed across 3 AZs. Which combination is most appropriate?
正解: A
RF=3 + acks=all + minISR=2 keeps writes flowing during a single broker failure while preserving data. Rack awareness spreads replicas across AZs, and disabling unclean leader election prevents data loss. B and C have weak acks/minISR settings, and D's RF=1 leaves no fault tolerance.
When should RF be larger than 3?
Consider RF=5 when you need to tolerate an AZ failure plus an additional node failure within the same region, or when strict compliance requirements demand extra redundancy. Storage and replication bandwidth grow accordingly, so justify the choice with clear SLO and cost rationale.
How do you change the RF of an existing topic?
Changing RF requires partition reassignment. Build a reassignment plan that includes the new replica broker targets and run the re-replication. Configure replication throttling during the operation to limit the impact on normal traffic.
How is RF different from cross-cluster replication (e.g. MirrorMaker2)?
RF provides partition redundancy within a single cluster. Cross-cluster replication tools like MirrorMaker2 forward streams to a separate cluster for disaster recovery or regional isolation; they operate at a different design layer with different goals. Combine both to build a layered recovery strategy.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...