Unclean Leader Election (ULE) is a mechanism that, in emergencies when no live replica remains in the ISR, elects a leader even from the OSR to restore availability. The price you pay is the rollback of unreplicated records — that is, data loss.
This article walks through ULE's behavior, the data-loss mechanism, the criteria for configuration decisions, combinations with related parameters, monitoring and drills, and the points most frequently tested on the CCAAK exam — all from a practical perspective. The explanation is grounded in stable concepts that align with Kafka's official documentation.
Each Kafka partition consists of a leader and followers, and the set of replicas that are sufficiently in sync is the ISR (in-sync replicas). Normal leader elections occur only from the ISR, which preserves consistency.
ULE elects a leader even from the OSR (out-of-sync replicas) when no live replica remains in the ISR after a failure, bringing the partition back online. The default is disabled (false); while disabled, the partition stays offline until the ISR recovers.
Election path during a failure (when the entire ISR is lost)
パーティション P (RF=3)
初期状態:
R1(Leader) R2(Follower) R3(Follower)
ISR = {R1, R2} OSR = {R3}
障害発生: R1ダウン、R2も同時障害
生存: R3のみ(ただしOSR)
分岐:
1) ULE=false(デフォルト)
-> リーダー不在のまま待機
-> パーティションPは一時オフライン
-> ISR回復後に通常の選挙
2) ULE=true
-> OSRのR3を暫定リーダーに選出
-> 旧R1にだけあった未複製レコードは失われる(ロールバック)
-> 可用性は回復するが履歴は不連続になり得るRelated properties (broker default and topic override)
# broker(server.properties あるいは動的設定)
unclean.leader.election.enable=false # 推奨デフォルト
# トピック単位で例外的に有効化(避難ログ等)
# Kafka (kafka-configs.sh)
kafka-configs.sh --bootstrap-server <broker:9092> \
--alter --topic critical-telemetry \
--add-config unclean.leader.election.enable=true
# 有効値の確認
kafka-configs.sh --bootstrap-server <broker:9092> \
--describe --topic critical-telemetry | grep uncleanULE is only meaningful when 'no live replica exists in the ISR' AND 'at least one live OSR exists.' When ULE is disabled, the partition temporarily goes offline, and clients may receive errors like NotLeaderOrFollower or LEADER_NOT_AVAILABLE.
When enabled, the controller elects a new leader from the OSR. Records that were not fully replicated from the old leader to the new leader do not exist in the new leader's log, so when the old leader rejoins, truncation occurs based on the leader epoch and the high watermark (HW), and those records disappear from history. Writes that were acknowledged with acks=1 may be lost. With acks=all and a properly configured min.insync.replicas, writes that meet those conditions are in principle not lost; writes that do not meet the conditions are not ACKed and are returned as failures, so the application needs a retry design.
Example controller/broker logs (key points only)
[Controller] Partition P: No live ISR; unclean leader election is enabled
[Controller] Electing new leader from OSR: R3, new leader epoch=42
[Broker R3] Became leader for P-0 at epoch 42
[Broker R1] Truncating log for P-0 from offset 105 down to HW 102 (on recovery)For topics where history consistency is paramount, like financial accounting or order events, generally leave ULE disabled and enforce RF>=3, min.insync.replicas>=2, and producer acks=all. Accept the downtime and protect an error-free history.
Conversely, for observability/telemetry topics where short gaps are not business-critical, depending on SLA, you may decide to enable ULE on selected topics. Even then, design min.insync.replicas, acks, alerting, and retry logic together as one package.
| Configuration pattern | Availability (during failure) | Data-loss risk / producer behavior |
|---|---|---|
| ULE=false (default) | Low (partition halts until ISR recovers) | Minimum loss risk. Writes that meet acks=all + minISR are robust. Writes that do not meet the conditions are returned as failures. |
| ULE=true | High (immediate election from OSR keeps the partition alive) | Unreplicated tail records may roll back. Even acks=1 ACKed writes can be lost. |
| ULE=true + min.insync.replicas=2 + acks=all | High (operation can continue) | Writes meeting the conditions are safe, but writes that do not are returned as failures. Past acks=1 writes can still be lost. |
| RF=1 (either ULE setting) | Depends on a single broker | Without replication, a single point of failure causes data loss regardless of ULE. Not recommended. |
Examples reflecting these decisions (payments = strict / telemetry = availability-first)
# 決済トピック(厳格)
kafka-topics.sh --bootstrap-server <broker:9092> --alter --topic payments \
--config unclean.leader.election.enable=false \
--config min.insync.replicas=2
# プロデューサは acks=all を必須化
# テレメトリ(可用性優先)
kafka-topics.sh --bootstrap-server <broker:9092> --alter --topic telemetry \
--config unclean.leader.election.enable=true \
--config min.insync.replicas=2
# ただし未達はアプリ側で再送・欠落許容の前提設計を行うStart with an audit. Inspect both the broker defaults and per-topic overrides, and confirm there is no drift from expectations. When possible, make changes minimally and per-topic — modifying cluster-wide defaults has a much larger blast radius.
Dynamic config changes do not require a rolling restart, but you should verify in staging — via fault injection (broker shutdown) — that the change does not conflict with existing client SLAs or retry strategies.
Concrete examples of audit and change
# ブローカー既定(ランタイムの動的設定も含む)
kafka-configs.sh --bootstrap-server <broker:9092> --entity-type brokers --entity-name 0 --describe
# (クラスタ全体を見たい場合は各ブローカー、または--allや外部CMDBを使用)
# トピックの有効値
kafka-configs.sh --bootstrap-server <broker:9092> --describe --topic <topic>
# 既定値の変更(慎重に)
kafka-configs.sh --bootstrap-server <broker:9092> --entity-type brokers --entity-name 0 \
--alter --add-config unclean.leader.election.enable=false
# トピック単位の上書き
kafka-configs.sh --bootstrap-server <broker:9092> --alter --topic <topic> \
--add-config unclean.leader.election.enable=true
# 期待するmin.insync.replicasと合わせて設定
kafka-topics.sh --bootstrap-server <broker:9092> --alter --topic <topic> \
--config min.insync.replicas=2Whether or not ULE is enabled, what matters is having metrics and logs in place to grasp immediately what happened during a failure. In particular, monitor offline partition count, ISR shrinks/expands, and latency/error rates.
During drills, perform a planned broker shutdown in the staging environment and verify producer/consumer success/failure patterns and the occurrence (or not) of log truncation under both ULE-enabled and ULE-disabled settings.
Simple drill recipe (staging only)
# 準備:テストトピック(RF=3, minISR=2)
kafka-topics.sh --bootstrap-server <broker:9092> --create --topic t_ule_test \
--partitions 3 --replication-factor 3 \
--config min.insync.replicas=2 \
--config unclean.leader.election.enable=<true|false>
# プロデュース(acks=all)
kcat -b <broker:9092> -t t_ule_test -P -X acks=all -l msgs.txt &
# ブローカー2台を順に停止(ISRを枯渇させる)
systemctl stop kafka@<brokerA>
systemctl stop kafka@<brokerB>
# 観測
kafka-topics.sh --bootstrap-server <broker:9092> --describe --topic t_ule_test | grep Leader
# JMX/メトリクス: OfflinePartitionsCount, IsrShrinksPerSec を確認
# ログに Truncating が出るか(ULE=true)/パーティション停止(ULE=false)を確認On the CCAAK exam, the ULE definition, its default, the conditions for data loss, and the relationship with min.insync.replicas and acks come up frequently. Watch out in particular for questions that target the misconception that 'acks=all is always safe.' When the conditions are not met (ISR shortage), the safe behavior is to fail without ACKing — and enabling ULE may cause history to roll back. Memorize that relationship as a single story.
There are also questions designed to confuse Preferred Leader Election (preferred-leader switching among healthy candidates) with Unclean Leader Election (emergency election from the OSR). The former does not break consistency; the latter may cause data loss.
CLI snippets worth memorizing
# ULEのトピック上書き
kafka-configs.sh --bootstrap-server <broker:9092> --alter --topic <t> \
--add-config unclean.leader.election.enable=true
# min.insync.replicas の設定
kafka-topics.sh --bootstrap-server <broker:9092> --alter --topic <t> \
--config min.insync.replicas=2
# Preferred Leader Election(別概念、整合性は維持)
kafka-preferred-replica-election.sh --bootstrap-server <broker:9092>CCAAK
問題 1
A partial cluster failure leaves no live replica in a partition's ISR. If you set unclean.leader.election.enable=true on that topic to avoid downtime, which outcome is most accurate? (The producer is running with acks=1.)
正解: A
When ULE is enabled, election proceeds from the OSR, and unreplicated data that only the former leader held can be lost via truncation. acks=1 does not prevent loss. When ULE is disabled, the partition stays offline until the ISR recovers.
Does enabling ULE reduce producer errors?
Right after a failure, availability improves and the ACK rate may temporarily climb. However, unreplicated history can roll back, which may be fatal depending on your application's consistency requirements. Revisit your acks=all and min.insync.replicas design first.
Is combining transactions (EOS) with ULE safe?
It is not recommended. If transaction markers or tail records remain unreplicated when ULE switches the leader to an OSR, truncation occurs during recovery and consistency breaks. Disable ULE on topics that use transactions, and operate with RF>=3, min.insync.replicas>=2, and acks=all as preconditions.
Is there a middle-ground option to 'wait a while before' triggering ULE?
Kafka itself has no general setting to directly control such a wait window. In practice, keep ULE disabled as a rule, and instead substitute faster recovery (monitoring, automatic recovery, prompt broker restarts), sufficient RF, appropriate min.insync.replicas, and client retry/backoff design. If you do use ULE, limit it to specific topics and understand the blast radius through drills.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...