Kafka's fault tolerance rests on replication and ISR (In-Sync Replicas). Simply increasing the replication factor is not enough to prevent data loss. The combination of acks and min.insync.replicas, plus how you set unclean.leader.election.enable, determines consistency.
This article keeps certifications like CCAAK in mind while organizing design points that often confuse teams in the field, following the behavior in the official documentation. We specifically cover safe configurations assuming RF=3, ISR shrink/expand, the meaning of HW (High Watermark) and commits, and availability trade-offs during failover.
Each Kafka partition consists of one leader and multiple followers, with the total replica count determined by replication.factor (RF). Followers fetch from the leader in order and replicate the log in the same sequence.
ISR is "the set of replicas sufficiently in sync with the leader." Followers that fail to catch up within a configured time threshold are removed from the ISR, which shrinks and expands automatically. Producer acks=all decides commit eligibility based on the ISR.
Relationship between Producer, Leader, Followers, and ISR
Topic creation and minimum ISR configuration example
kafka-topics --bootstrap-server broker1:9092 \
--create --topic orders --partitions 6 --replication-factor 3
# Set per-topic min ISR=2 (assumes RF=3)
kafka-configs --bootstrap-server broker1:9092 \
--alter --topic orders --add-config min.insync.replicas=2Followers periodically fetch data from the leader. If a follower cannot fetch or catch up for longer than a configured time, it is removed from the ISR. Once caught up again, it is automatically re-added.
ISR also acts as a safety valve for throughput control. By removing replicas that can't keep up, you avoid excessive delays waiting for replication before commit. However, when the ISR shrinks below min.insync.replicas, acks=all writes fail.
Checking ISR (the Describe output shows the ISR)
kafka-topics --bootstrap-server broker1:9092 --describe --topic orders
# Sample output excerpt
# Partition: 0 Leader: 2 Replicas: 2,1,3 ISR: 2,3Kafka commits are based on the High Watermark (HW). HW is the minimum LEO (Log End Offset) across replicas in the ISR; offsets below the HW are considered replicated to all ISR members.
An acks=all producer response is returned when the record is committed and the HW advances to a position that includes that record. If the ISR is slow, it waits temporarily; if the ISR shrinks and reorganizes, it can proceed.
Example producer configuration favoring consistency
bootstrap.servers=broker1:9092,broker2:9092,broker3:9092
acks=all
enable.idempotence=true
retries=2147483647
max.in.flight.requests.per.connection=1
request.timeout.ms=30000
delivery.timeout.ms=120000acks is the policy that controls at which stage the leader responds, and min.insync.replicas is the lower bound on the ISR size required to commit. The common safe configuration for RF=3 is "acks=all + min.insync.replicas=2." This lets new writes safely continue with one broker down, and stops writes when a second broker is lost to avoid data loss.
If you prioritize low latency, acks=1 is an option, but a sole leader failure can lose the most recent writes. For batch jobs and important data, use acks=all together with min.insync.replicas as the default.
| acks | min.insync.replicas (topic) | Consistency / loss risk | Tolerable concurrent failures (RF=3) |
|---|---|---|---|
| 0 | Any | Returns success even before reaching the broker. High loss risk. | 0 |
| 1 | 1 | Responds on leader write. Writes just before a leader failure can be lost. | 0 |
| all | 2 | Recommended in practice. Continues during a single-broker failure and stops safely when the second fails. | 1 |
| all | 3 | Strictest setting. Always requires all replicas. Lower availability. | 1 |
Per-topic acks operations and configuration verification flow
# Producer side (e.g., app config)
acks=all
# Set the topic-side min ISR to 2 (assumes RF=3)
kafka-configs --bootstrap-server broker1:9092 \
--alter --topic payments --add-config min.insync.replicas=2
# Verify the configuration
kafka-configs --bootstrap-server broker1:9092 \
--describe --topic payments | grep min.insync.replicasIn a healthy failover, the new leader is elected only from replicas within the ISR. This is clean leader election, which preserves order and integrity. On the other hand, if you allow unclean.leader.election.enable, a lagging replica outside the ISR can become leader, and unreplicated writes get rolled back, causing data loss.
If you want strong consistency, use unclean.leader.election.enable=false. In that case, if the ISR is completely lost, the partition temporarily becomes unwritable/unreadable, but no data is lost.
Disabling unclean leader election per topic
kafka-configs --bootstrap-server broker1:9092 \
--alter --topic orders --add-config unclean.leader.election.enable=false
# Verify
diff <(kafka-configs --bootstrap-server broker1:9092 --describe --topic orders | sort) /dev/null | catContinuous monitoring of the ISR and replica lag is essential. Watch whether UnderReplicatedPartitions stays at 0, the rate of IsrShrinks/Expands, and network and disk wait times. On the producer side, you need retry strategies and alerting for NotEnoughReplicas-class errors.
Before going to production, run fault-injection tests repeatedly to confirm: acks=all + min.insync.replicas=2 keeps running with one broker down, writes stop with two brokers down, and order and integrity are preserved after recovery.
Example health checks via CLI
# Detect under-replicated partitions
kafka-topics --bootstrap-server broker1:9092 --under-replicated-partitions --describe
# List partition leaders / ISR
kafka-topics --bootstrap-server broker1:9092 --describe --topic orders
# For per-broker load, use external monitoring or JMX exportCCAAK
問題 1
For an RF=3 topic, you want to continue writes without data loss even when one broker fails. Which combination of producer and topic settings is most appropriate?
正解: A
Combining acks=all and min.insync.replicas=2 with RF=3 lets writes continue without loss when one broker fails, and safely fails when a second broker is down. unclean.leader.election.enable=false prevents election from outside the ISR, avoiding loss from rollbacks. Every other option either risks data loss or unnecessarily reduces availability.
What's the difference between ISR and AR (All Replicas)?
AR is the full set of replicas for a topic, while ISR is the subset that is sufficiently caught up with the leader. Commits and acks=all are evaluated against the ISR. Lagging replicas are temporarily removed from the ISR.
Is acks=all alone enough to guarantee safety?
No. The min.insync.replicas floor matters. If the ISR drops below the floor, writes fail safely, but setting the floor too low increases the risk of data loss on failure. Also, unclean.leader.election.enable=true can cause rollbacks.
Is RF=2 sufficient?
It can work for small-scale use, but it creates more dilemmas during network partitions or maintenance. For strong consistency and operational headroom, RF=3 or higher is recommended. With RF=2, acks=all + minISR=2 is strict but has low availability, while minISR=1 increases the risk of loss.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...