Kafka's ordering guarantee holds at the partition level, not at the topic level. That makes message key design the lever that determines not only data distribution but also the scope of ordering.
Grounded in the official documentation, this article walks through how the default partitioner behaves, how to avoid hot partitions, how to keep serialization stable, and what happens when partition counts change — from both exam and operational perspectives.
Kafka's ordering guarantee is limited to write order within a single partition. If you consistently route the same key to the same partition, messages for that key stay in order. Total ordering across partitions, however, is not guaranteed.
Producers preserve send order per partition, but retries and in-flight request settings can disturb that order. Within a consumer group, each partition is assigned to exactly one consumer thread at a time, so the minimum unit of parallelism also equals the partition count.
When a key is set, the standard Kafka producer hashes the serialized key bytes and assigns the partition using the result modulo the partition count. Identical key bytes always land on the same partition.
When the key is null, records are distributed via a near round-robin strategy (sticky for batching efficiency). Null keys provide no per-key ordering guarantee, so always set a key when ordering matters.
How keys drive partition placement and ordering
Skewed key distributions concentrate load on a single partition, driving latency up and throughput down. When you design keys, be explicit about how you balance ordering scope and scalability.
If you need to keep per-key ordering intact while increasing throughput, the standard play is to grow the partition count and align producer/consumer parallelism. When you can relax ordering somewhat, reach for salts or composite keys to scale further.
| Key Strategy | Ordering Scope | Skew Risk | Scalability |
|---|---|---|---|
| Null key | None | Low | High |
| Single business key (e.g., userId) | Strong within the key | Potentially high depending on distribution | Medium |
| Salted key (userId + random) | Effectively none | Low | High |
| Composite key (e.g., userId + sessionId) | Sub-entity granularity such as session | Medium | Medium to high |
| Custom partitioner (stable hash) | Strong within the key | Depends on distribution | Medium |
The default partitioner hashes the serialized bytes, not the key object itself. That means the same logical key can be routed to a different partition if you change the serializer or encoding.
When you express composite keys as concatenated strings, pin down the separator and normalization rules so the byte representation does not drift over time. With Avro or Protobuf binary representations, understand how field order, default values, and variable-length encoding affect the bytes, and lock them down for stability.
Kafka's default hash uses plain modulo arithmetic, so changing the partition count reshuffles the key → partition mapping at scale. The same key may move to a different partition, with downstream impact on consumer state and ordering assumptions.
In Kafka Streams and ksqlDB, operations that require re-keying or shuffling automatically introduce repartition topics. For external producers, it is safer to assume partition counts will change and to prepare observability for it (latency, throughput, and hot-partition detection).
To achieve ordering and deduplication together, enable idempotent producing on the producer side and combine it with acks=all. Keep in-flight requests within the limit allowed under idempotence to avoid retry-induced reordering. On the consumer side, stick to one-thread-per-partition processing and add a reordering buffer with a timeout if needed.
To detect hot partitions, surface key distribution metrics (records and latency per partition). When skew exceeds your threshold, consider — depending on requirements — composite keys, limited salting, or scaling the partition count.
Java: custom partitioner skeleton (stable hash example)
package example;
import java.util.Map;
import org.apache.kafka.clients.producer.Partitioner;
import org.apache.kafka.common.Cluster;
import org.apache.kafka.common.utils.Utils;
public class StableHashPartitioner implements Partitioner {
private int virtualNodes = 64; // Tunable. Higher values smooth distribution.
@Override
public void configure(Map<String, ?> configs) {
Object v = configs.get("stable.partitioner.vnodes");
if (v instanceof String) {
try { virtualNodes = Integer.parseInt((String) v); } catch (NumberFormatException ignore) {}
}
}
@Override
public int partition(String topic, Object keyObj, byte[] keyBytes,
Object value, byte[] valueBytes, Cluster cluster) {
int partitionCount = cluster.partitionsForTopic(topic).size();
if (keyBytes == null || partitionCount <= 0) {
// Fallback: do not mimic the default sticky strategy; just use simple modulo.
return 0;
}
// Simple HRW/Rendezvous style: score multiple virtual nodes and pick the maximum.
int selected = 0;
long bestScore = Long.MIN_VALUE;
for (int p = 0; p < partitionCount; p++) {
long score = 0L;
for (int v = 0; v < virtualNodes; v++) {
// Utils.murmur2 is the 32-bit murmur2 used internally by Kafka.
int h1 = Utils.murmur2(keyBytes);
int h2 = Utils.murmur2((topic + ":" + p + ":" + v).getBytes());
long combined = ((long)h1 << 32) ^ (h2 & 0xffffffffL);
score = Math.max(score, combined);
}
if (score > bestScore) { bestScore = score; selected = p; }
}
return selected;
}
@Override
public void close() {}
}
CCDAK
問題 1
Without changing the topic's partition count, you tweak only producer-side settings. Which change is most likely to alter the existing key → partition mapping for the same logical key?
正解: A
The default partitioner hashes the serialized key bytes to choose a partition. Changing the serializer or format changes the bytes for the same logical key, so placement shifts. The other options do not affect the partitioning algorithm itself.
How do null keys interact with log compaction?
Log compaction keeps the latest version per key. Records with a null key are not eligible for compaction. A tombstone (delete marker) is expressed as the combination of a non-null key and a null value.
How do you increase throughput while keeping ordering guarantees?
Keep the key, then scale up the partition count and consumer parallelism together. Enable idempotent producing with acks=all, and keep max.in.flight.requests.per.connection within a safe range. Salting the key breaks ordering, so use it only where the requirements allow.
Does the key-to-partition mapping survive a partition count change?
No. The default hash relies on modulo arithmetic, so changing the partition count reshuffles the mapping broadly. If stability matters, use a custom partitioner with a stable hash, or design for reprocessing and re-ordering on the assumption that the mapping will change.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...