min.insync.replicas: Durability Threshold (2026)

min.insync.replicas defines the minimum number of in-sync replicas (ISR members) required for a write to be considered successful. Combined with acks=all, it lets you control effective durability.

This article quantifies the trade-offs based on the official documentation, walks through concrete failure behavior, and summarizes operational best practices. We also highlight points that frequently appear on the CCAAK exam.

Fundamentals: How ISR, acks, and min.insync.replicas Relate

Each Kafka partition consists of one Leader and multiple Followers, and the set of replicas sufficiently in sync with the Leader is the ISR. min.insync.replicas (minISR) is the lower bound on "how many ISR acks are required to consider a write successful".

With acks=all (= -1), the Leader waits for acks from every ISR member. However, if the ISR size falls below minISR, the producer fails with a NotEnoughReplicas error. With acks=1 or 0, minISR is effectively not enforced — a write succeeds with just the Leader, or immediately upon network send.

The relationship between replication.factor (RF) and minISR is strict. minISR must be less than or equal to RF, and RF − minISR roughly defines "how many simultaneous failures you can tolerate (from a write-continuity standpoint)".

minISR is set per-topic. If unset, the broker default (min.insync.replicas) applies
minISR takes effect when acks=all. If ISR < minISR, writes fail
The gap between RF and minISR indicates "how much can break before writes stop"
unclean.leader.election.enable=false is recommended. It prioritizes data integrity (at some cost to availability)

Visualizing acks=all with min.insync.replicas=2 (RF=3)

Quantifying the Durability vs. Availability Trade-off

What both production and the exam test is: given an minISR/acks combination, how many failures can the cluster absorb while preserving write/read availability and data durability? As a useful approximation, the number of simultaneous failures tolerated for write continuity is RF − minISR (assuming acks=all).

Because acks=all waits for every ISR member, a slow replica in the ISR increases wait time and can lead to NotEnoughReplicasAfterAppend on timeout. A higher minISR reduces the risk of data loss but affects write availability and latency.

RF is the upper bound on fault tolerance; minISR is the lower bound on "effective durability"
acks=all + higher minISR tends to mean higher durability, lower availability, and higher latency
acks=1 gives higher availability and lower latency but a higher risk of data loss on Leader failure

acks	min.insync.replicas (m)	Write success condition	Tolerated failures (write continuity)
0	Any	Succeeds immediately on send (no broker confirmation)	Hard to evaluate (high availability, low meaning)
1	Any	Succeeds when the Leader appends	Roughly RF − 1 (assuming the Leader stays up)
all	1	All ISR ack and ISR size ≥ 1	RF − 1
all	2	All ISR ack and ISR size ≥ 2	RF − 2
all	RF	Every replica must always ack (ISR = all)	0

Example producer properties prioritizing durability

acks=all
enable.idempotence=true
retries=2147483647
max.in.flight.requests.per.connection=1
request.timeout.ms=30000
delivery.timeout.ms=120000
compression.type=snappy
linger.ms=10
batch.size=65536

Recommended Configuration Patterns by Workload

There is no one-size-fits-all answer. Combine RF, minISR, and acks to match your SLA (latency, availability, integrity) and failure model. On the exam, "RF=3, minISR=2, acks=all" frequently appears as a practical, safe default.

Even when low latency is critical, it is usually more stable overall to keep acks=all and minISR=2 to guarantee minimum integrity, and then optimize batching, compression, and networking instead.

Balanced (general events): RF=3, minISR=2, acks=all
Durability-focused (audit, payments): RF=3-5, minISR ≥ RF − 1, acks=all, unclean leader election disabled
Ultra-low-latency (transient buffer acceptable): RF=3, minISR=1, acks=1 or all — with explicit agreement on loss risk
Environments with frequent local failures: fix network/disk latency and stabilize the ISR before bumping RF

Concrete topic / broker configuration examples

# Specify minISR when creating a topic
kafka-topics --bootstrap-server <broker> \
  --create --topic orders \
  --partitions 12 --replication-factor 3 \
  --config min.insync.replicas=2

# Change it on an existing topic
kafka-configs --bootstrap-server <broker> \
  --alter --entity-type topics --entity-name orders \
  --add-config min.insync.replicas=2

# Broker default (server config)
# server.properties
min.insync.replicas=2
unclean.leader.election.enable=false

Failure Scenarios: Behavior and Error Codes

If the ISR has shrunk below minISR and you attempt an acks=all write, NotEnoughReplicas is returned immediately. This means "at the start of the write, the required minimum number of in-sync replicas is insufficient".

Even if ISR ≥ minISR at the start of the write, if a node drops out of the ISR or times out while waiting, you may see NotEnoughReplicasAfterAppend. This means "the append happened, but replication did not catch up in time".

On Leader failure, a new Leader is elected from within the ISR. With unclean.leader.election.enable=false, election from outside the ISR is not allowed — prioritizing consistency over availability.

NotEnoughReplicas: ISR < minISR at the start of the write
NotEnoughReplicasAfterAppend: required replication could not be met mid-write
Frequent IsrShrinks/Expands events suggest disk or network latency issues
Leader handoff via controlled shutdown helps preserve consistency

Example timeout tuning to surface behavior (test environment)

# Producer (kept short so errors surface)
acks=all
request.timeout.ms=10000
delivery.timeout.ms=20000
retries=10
max.in.flight.requests.per.connection=1

Monitoring, SLO Design, and Tuning Essentials

To get value out of minISR, the ISR must be stable. In practice, set SLOs so that UnderReplicatedPartitions stays near 0 and IsrShrinksPerSec does not spike.

Track write SLOs across three dimensions — "p99 latency", "error rate", and "ISR stability" — to make root causes easier to trace. When running acks=all, set appropriate timeouts and keep network and disk headroom.

Essential metrics: UnderReplicatedPartitions, OfflinePartitionsCount, IsrShrinks/ExpandsPerSec
Producer side: record-error-rate, record-retry-rate, request-latency-avg/p99
Sizing: with RF=3 and minISR=2, you need stable I/O and network bandwidth for at least 2 nodes
Example alerts: UnderReplicatedPartitions > 0 for T consecutive minutes, or IsrShrinksPerSec > 0 for N consecutive minutes

Timeout and retry guidelines for acks=all operations

# Guidelines (must be tuned against actual workload and network)
request.timeout.ms=30000
delivery.timeout.ms=120000
retries=2147483647
retry.backoff.ms=100
socket.connection.setup.timeout.ms=10000

CCAAK Exam Prep: Key Points

Two classic exam topics: "the gap between RF and minISR" and "minISR only takes effect when acks=all". Also nail down failure tolerance, the difference between error codes, and the impact of unclean leader election.

Configuration-rule questions also appear, such as "topic config overrides broker defaults" and "minISR > RF is an error".

Memorize: tolerated failures (write continuity) ≈ RF − minISR (assuming acks=all)
minISR is effectively irrelevant under acks=1/0 (and durability drops)
The difference between NotEnoughReplicas and NotEnoughReplicasAfterAppend
unclean.leader.election.enable=false prioritizes data integrity
Topic-level config overrides broker defaults

Check Your Understanding

CCAAK

問題 1

A topic has replication.factor=3 and min.insync.replicas=2. The producer uses acks=all. One Follower goes down, while the Leader and one remaining Follower stay in the ISR. Which statement describes the correct behavior?

Writes can continue. The Leader and the remaining ISR member (2 nodes total) acking is enough to succeed
Writes always stop and NotEnoughReplicas is returned
All 3 nodes must ack for success, so writes cannot continue
acks=all ignores min.insync.replicas, so there is no impact

正解: A

With acks=all, the Leader waits for acks from every ISR member, and the ISR size must be ≥ minISR. In this case, even after one Follower leaves, the ISR size still meets 2, so acks from the Leader plus the remaining ISR member succeed. If the ISR drops below minISR, the write fails with NotEnoughReplicas.

Frequently Asked Questions

Does min.insync.replicas matter when acks=1?

Effectively no. With acks=1, a write is acknowledged as soon as the Leader appends it, so the ISR size requirement (minISR) is not enforced. However, if the Leader fails before Followers catch up, the risk of data loss increases.

Where do you configure min.insync.replicas?

It is typically set as a topic-level config. If unset, the broker's default min.insync.replicas applies, and the topic-level value overrides the broker default. The value must be less than or equal to replication.factor.

How does unclean.leader.election.enable relate to minISR?

Disabling unclean leader election (false) prevents Leader election from outside the ISR, improving data integrity but potentially reducing availability when no Leader is present. minISR defines "how many in-sync replicas are required for a successful write", while the unclean setting defines "where a Leader can be elected from". These serve different purposes, so combine both to define your consistency policy.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Kafka min.insync.replicas in Practice and Exams: Balancing Durability and Availability