A Topic is a logical name and a Partition is an ordered, immutable log. Scaling and ordering guarantees are both anchored to the Partition.
Availability depends on replication and the ISR. Parallelism is capped by the rule that Consumer parallelism cannot exceed the Partition count.
In Kafka a Topic is a category name for events, and a Partition is an ordered, immutable log produced by horizontally splitting the Topic. Each record carries a monotonically increasing offset within its Partition, and ordering guarantees are confined to that Partition.
Availability comes from replication. Each Partition has one Leader and zero or more Followers, with Followers tailing the Leader's log. The set of Replicas keeping up with the Leader is the ISR (In-Sync Replicas). Write durability is governed by the combination of acks and min.insync.replicas (covered in detail later).
Distributing and replicating a Topic (example: partitions=3, RF=3)
Create and inspect a Topic via the CLI (3 partitions, RF=3)
kafka-topics --bootstrap-server broker1:9092 \
--create --topic orders --partitions 3 --replication-factor 3
kafka-topics --bootstrap-server broker1:9092 \
--describe --topic ordersThe Producer uses a partitioner to choose the destination Partition. For keyed records, hashing the key sends the same key to the same Partition, preserving per-key ordering. This is a recurring CCDAK theme.
For records without a key, distribution depends on the client implementation (round-robin or sticky partitioning that improves batching). The exam and design takeaway is: use a key whenever you need ordering. Durability is tuned with acks, and duplicates are suppressed with the idempotent Producer (enable.idempotence=true).
Send keyed records using the console Producer
kafka-console-producer --bootstrap-server broker1:9092 \
--topic orders --property parse.key=true --property key.separator=:
# Input examples (key on the left, value on the right)
user-42:{"op":"create","id":1}
user-42:{"op":"update","id":1}
user-7:{"op":"create","id":2}Consumers sharing the same group ID divide Partitions among themselves. Only one Consumer in a Consumer Group can process a given Partition at a time, so the maximum parallelism caps at min(partition count, consumer count).
Rebalancing kicks in when group membership or subscriptions change, briefly halting processing. Tune session timeouts and heartbeats, and adopt stable deployment strategies to avoid excessive rebalances.
Read via a Consumer Group to parallelize across processes
# Terminal 1
kafka-console-consumer --bootstrap-server broker1:9092 \
--topic orders --group app-readers --from-beginning
# Terminal 2 (same group; Partitions are split between consumers)
kafka-console-consumer --bootstrap-server broker1:9092 \
--topic orders --group app-readersEstimate based on target throughput, the throughput a single Partition can sustain, and the Consumer parallelism you need. As a rule of thumb, provision enough to meet current demand plus headroom, and add more later if necessary (since you cannot reduce the count, choose the initial value carefully).
More Partitions mean more metadata, more files, more open file descriptors, and more network load. Workloads that need total ordering must stick to a single Partition at the cost of throughput. Make the trade-off explicit per use case.
| Design | Pros | Cons | Typical use cases |
|---|---|---|---|
| Single Partition | Preserves total ordering; easy to debug | Low throughput and parallelism (does not scale) | Queues that require strict ordering; low traffic |
| Few (2–6) | Moderate parallelism; low operational overhead | May hit throughput ceiling during peaks | Mid-scale event processing; starting point for staged growth |
| Many (10–50) | High throughput; lets Consumers scale horizontally | More metadata and files; higher rebalance cost | Scaling out production workloads |
| Very many (100+) | Extreme throughput and fine-grained parallelism | Hard to operate; concerns over FDs, memory, controller load | Large multi-tenant or organization-wide central platforms |
Write durability is controlled by acks. acks=all means the response is sent only after the record is replicated to the Leader and the required number of Replicas in the ISR. min.insync.replicas is the lower bound on the number of in-sync Replicas needed to respond when acks=all is set.
The practical recommendation for critical Topics is replication.factor=3, min.insync.replicas=2, and acks=all on the Producer. This keeps writes flowing even if one Broker fails while minimizing data-loss risk. Monitor Under-Replicated and Offline Partitions and target zero for both as cluster-health signals.
Example durability settings for the Topic and Producer
# Set min.insync.replicas when creating the Topic
kafka-topics --bootstrap-server broker1:9092 \
--create --topic critical-events --partitions 6 --replication-factor 3 \
--config min.insync.replicas=2
# Producer (example properties)
acks=all
enable.idempotence=true
retries=2147483647
max.in.flight.requests.per.connection=5You can grow the Partition count but cannot shrink it. After increasing it, an existing key may map to a different Partition under the new count, so the same-key-same-Partition property does not span the change boundary (the in-Partition ordering guarantee itself still holds).
Retention is controlled by retention.ms and retention.bytes, and the cleanup strategy by cleanup.policy=delete or compact. Compaction is ideal when you want to keep only the latest value per key; when you need a full event history use the delete policy with a sufficiently long retention. Reassignments and maintenance move Replicas around and consume I/O, so plan them within windows and roll them out in stages.
Common operational commands
# Increase the Partition count
kafka-topics --bootstrap-server broker1:9092 \
--alter --topic orders --partitions 12
# Adjust Topic retention
kafka-configs --bootstrap-server broker1:9092 --entity-type topics \
--entity-name orders --alter --add-config retention.ms=604800000
# Start a reassignment (example; the plan is provided as JSON)
kafka-reassign-partitions --bootstrap-server broker1:9092 \
--execute --reassignment-json-file plan.jsonCCDAK
問題 1
You need to process high-traffic order events while preserving per-user ordering and also boosting overall throughput. Which design is best?
正解: A
Using the user ID as the key keeps records for the same user in the same Partition, preserving order. Increasing the Partition count raises both parallelism and throughput. B keeps Partition=1 so parallelism cannot grow, C caps throughput, and D severely weakens durability.
Can I reduce the partition count later?
Generally no. If you must, redesign a new Topic and migrate via Mirror or streaming processing. You can add Partitions, but be aware that the key-to-Partition mapping changes for future writes.
How are records distributed when sent without a key?
It depends on the client implementation (round-robin, sticky partitioning, etc.). The key point for exams and design is to attach a key to any unit where order matters so the same key always lands on the same Partition.
How strong is Kafka's ordering guarantee?
Kafka only guarantees order within a single Partition. There is no total ordering across an entire Topic. If you need strict total ordering, use a single Partition or implement ordering at the application level.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...