Kafka Quotas: Producer, Consumer, Request Quotas (2026)

When you run Kafka across multiple tenants or business units, heavy traffic from one client can easily starve the others. Kafka quotas exist to tame this resource contention and hand each client a predictable share of throughput.

This article walks through how to design, configure, and observe quotas for stable cluster operations, drawing on the official documentation and going deep enough to cover what the CCAAK certification asks about.

Kafka Quota Basics: What You Limit and How

There are three primary Kafka client quotas: producer_byte_rate (producer write bytes per second), consumer_byte_rate (consumer fetch bytes per second), and request_percentage (a proportional share of server processing capacity). When a limit is exceeded the broker does not hard-reject; instead it throttles by delaying responses, and the client learns about the delay from throttle_time_ms in the response.

The limit is smoothed over an observation window so that short bursts are tolerated while the average rate is enforced (controlled on the broker side by quota.window.num and quota.window.size.seconds). Entities can be specified as user, client-id, user+client-id (combination), or default (cluster-wide). The most specific entity wins, and default is applied last.

Target metrics: producer_byte_rate, consumer_byte_rate, request_percentage
Control mechanism: soft throttling via response delay (throttle_time_ms)
Smoothing: averages across an observation window, absorbing momentary bursts
Scope of application: user, client-id, user+client-id, default (wildcard)
Precedence: more specific configurations win (user+client-id > user > client-id > default)

Entity	Scope	Typical use	Sample key
user	Authenticated user (SASL Principal)	Total cap per department/team	users:alice
client-id	Per application/process	Allocate across apps owned by the same user	clients:etl-writer
user+client-id	User x app combination	Tighten only one specific app within a user	users:alice,clients:etl-writer
default	Default shared by all clients	Initial cap for new or unclassified clients	users:* or clients:*

Throttling flow (conceptual)

Representative quota keys (for clients)

# クライアント向けに設定可能なダイナミック Quota キー（抜粋）
# - producer_byte_rate: 1秒あたりの送信バイト上限
# - consumer_byte_rate: 1秒あたりの受信（フェッチ）バイト上限
# - request_percentage: サーバ処理能力に対する割合シェア（相対配分）
# いずれも超過時は応答に遅延を入れて調整されます。

Entities and Precedence: user, client-id, combinations, default

Quotas can be set at multiple levels simultaneously. At evaluation time the most specific match wins, and if nothing matches the broker falls back to a more general setting. For example, if a user=alice, client-id=etl combination is configured it will be used; otherwise it falls back to user=alice, then client-id=etl, and finally to default.

Using default lets you keep unregistered clients from running wild while selectively raising the cap for critical apps — a layered approach to enforcement.

Precedence (high → low): user+client-id > user > client-id > default
Default is your safety net. Set it properly first, then loosen on a case-by-case basis.
Baking a client.id naming convention into operations makes observability and allocation much easier.

Design pattern	Benefit	Caveats
Strict default + per-entity exceptions	Safe even for unregistered clients	Operational overhead grows as exceptions accumulate
Allocate per user	Easy to budget by department	Hard to differentiate between apps owned by the same user
Combine user+client-id	Fine-grained control	Entries pile up; periodic review is required

Precedence in pseudocode (match order)

# 擬似コード
if quota.exists(user=u, client=c): use that
elif quota.exists(user=u):        use that
elif quota.exists(client=c):      use that
else:                             use default

Configuration and Verification: kafka-configs.sh and the Admin API

Client quotas are applied to the cluster as dynamic configurations. Use kafka-configs.sh (with --bootstrap-server) or the Admin API. Changes take effect almost immediately and require no broker restart.

After configuring, verify with describe and use alter/remove as needed. The changes are stored as metadata in the cluster and take effect across all brokers.

Units are bytes/second. Values are evaluated as average rates.
Configurations apply cluster-wide (not to specific brokers).
Undo with --delete-config or by deleting the entity.

Operation	CLI example	Key point
Set	kafka-configs.sh --alter --add-config ...	Separate multiple keys with commas
Inspect	kafka-configs.sh --describe --entity-type ...	Be explicit about the target entity
Delete	kafka-configs.sh --alter --delete-config ...	When unset, falls back to a higher level or default

kafka-configs.sh examples (official-docs syntax)

# client-id 単位
bin/kafka-configs.sh --bootstrap-server localhost:9092 \
  --alter --add-config 'producer_byte_rate=1048576,consumer_byte_rate=1048576' \
  --entity-type clients --entity-name etl-writer

# user 単位（SASL Principal 名）
bin/kafka-configs.sh --bootstrap-server localhost:9092 \
  --alter --add-config 'producer_byte_rate=2097152' \
  --entity-type users --entity-name alice

# user+client-id（組み合わせ）
bin/kafka-configs.sh --bootstrap-server localhost:9092 \
  --alter --add-config 'consumer_byte_rate=524288' \
  --entity-type users --entity-name alice \
  --entity-type clients --entity-name etl-writer

# describe で確認
bin/kafka-configs.sh --bootstrap-server localhost:9092 \
  --describe --entity-type clients --entity-name etl-writer

Sizing the Limits: Working Backward From Effective Throughput

Set limits by accounting for expected message size, QPS, compression ratio, and overhead such as headers. Start by allocating around 80% of expected demand, then adjust step by step while watching throttle_time_ms, processing delay, and latency.

Consumer and producer sides can be asymmetric. For example, with heavy writes but reads concentrated in only a few groups, it is reasonable to keep consumer_byte_rate on the lower side.

Rough math: limit [bytes/s] ≈ average record size x QPS x safety factor
Short observation windows oscillate; design alongside smoothing settings.
Use request_percentage alongside, matched to your peak-latency SLO.

Design item	Guideline	Notes
producer_byte_rate	Start at ~80% of write peak	Raise if throttling is frequent
consumer_byte_rate	Match downstream processing capacity	If lag builds up, consider tightening or loosening
request_percentage	Relative share under contention	Affects the allocation of processing slots, not just bandwidth

Calculation example

# 平均 10 KB のレコードを 1500 rps で送る場合（圧縮後同程度と仮定）
# 10 * 1024 * 1500 ≈ 15,360,000 bytes/s ≈ 14.6 MiB/s
# 初期上限: 12–13 MiB/s 程度（観測しながら調整）
# → producer_byte_rate ≈ 13 * 1024 * 1024 = 13631488

Observability and Troubleshooting: throttle_time and Metrics

Clients detect throttling via throttle_time_ms in response headers. Continuously non-zero values strongly suggest the limit is being hit. On the broker side, per-request throttle times and rates are exposed as metrics (for example, ThrottleTimeMs per request type).

Excessive throttling causes producer buffers to fill up and latency to spike, while consumers see longer fetch intervals leading to app-side delay and growing lag. Start by checking the target entity's configuration and whether default is sweeping it up.

Many client implementations emit warnings about throttling or waits in their logs.
Use describe to confirm which level (user / client-id / default) is actually in effect.
Short observation windows produce false positives; tune smoothing as needed.

Symptom	Likely cause	What to check
Rising throttle_time_ms	Quota exceeded	Limit and precedence for the target entity
Growing consumer lag	consumer_byte_rate is set too low	Alignment with the group's processing capacity
P99 latency regression	request_percentage too low or contended	Relative allocation under contention

JMX/metric examples (names vary by environment)

# 例: リクエスト種別ごとのスロットル
# kafka.network:type=RequestMetrics,name=ThrottleTimeMs,request=Produce
# kafka.network:type=RequestMetrics,name=ThrottleTimeMs,request=FetchConsumer
# ダッシュボードで時系列を可視化し、対象エンティティの設定と突き合わせる

Operational Patterns and Caveats: Multi-Tenancy and Replication Differences

In multi-tenant setups, lock down default and only relax it for critical workloads or nightly batches. Standardize a client.id naming convention (such as team-app-purpose) and review the inventory each quarter to prevent rot.

Throttling for replication traffic lives in a different configuration domain than client quotas. It is used during operations like partition reassignment and should not be conflated with controlling application clients.

Set default first to keep unexpected new clients in check.
For critical workloads, grant the minimum necessary relaxation at user+client-id level.
Treat replication throttling settings as a separate concept (used to curb network usage during reassignment).
CCAAK angle: nail down the precedence and key names, the fact that throttling is implemented as delay, and the concept of observation windows.

Target	Control mechanism	Common confusion
Application clients	producer/consumer_byte_rate, request_percentage	User/client precedence
Replication	Dedicated replication throttle settings	Not a client quota
Unregistered clients	Strict default configuration	Exceptions must be reviewed periodically

Operational inventory (pseudocode procedure)

# 1) 実利用の client.id と Principal を収集（メトリクス/ログ）
# 2) default で保護されているか確認
# 3) 重要系だけ user+client-id で緩和
# 4) 90日以上未使用のエントリは削除候補に

Check Your Understanding

CCAAK

問題 1

In a Kafka cluster, you want to suppress unclassified new clients while granting only user=analytics with client-id=nightly-batch a high write bandwidth. Which configuration is most appropriate?

Set a low producer_byte_rate at default and configure a higher producer_byte_rate specifically for the user=analytics, client-id=nightly-batch combination.
Set a high producer_byte_rate on client-id=nightly-batch and do nothing else.
Set a high producer_byte_rate on user=analytics and do nothing else.
Set the same request_percentage for all clients.

正解: A

Suppressing unclassified clients is what default is for. To relax limits only for a specific user x client pair, lean on the rule that the more specific entity (user+client-id) wins. Setting only client-id or only user could ripple to other clients or other users under those scopes.

Frequently Asked Questions

Does a quota reject requests or just delay them?

Kafka client quotas are essentially soft throttling. When the limit is exceeded the broker delays the response, and the client receives throttle_time_ms back.

Can I tell which entity-level quota is actually in effect?

Use kafka-configs.sh --describe to inspect configurations and watch whether the client-side throttle_time_ms increases. Precedence is applied in the order user+client-id > user > client-id > default.

Can I control replication throttling with client quotas as well?

No. Replication throttling lives in a separate configuration domain. The client-facing producer/consumer_byte_rate and request_percentage do not control it.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Kafka Quotas and Bandwidth Control: Designing and Implementing Per-Client Limits