Kafka

Confluent Certified Administrator (CCAAK) Guide: Scope, Weighting, and Operational Essentials

2026-04-19
NicheeLab Editorial Team

This article condenses the operational essentials in the form CCAAK tends to ask about, grounded in the documented behavior and stable concepts of Kafka and the Confluent Platform.

The exam weights are not published in detail, but we prioritize not losing points in the high-impact operational areas: architecture, operations, security, and monitoring.

Exam Overview and How to Think About Weighting (When the Details Aren't Public)

CCAAK is primarily multiple choice and tests, across the board, the design decisions, security configuration, monitoring, and troubleshooting required to administer and operate Kafka and the Confluent Platform. The official syllabus can change, so always cross-check terminology and feature differences against the Confluent certification page and docs.confluent.io.

Detailed weighting is not published. From an operational standpoint, cluster architecture (replication, ISR, rack awareness), security (TLS/SASL, ACL, RBAC), operations (rolling upgrades, rebalancing, monitoring metrics, throttling), and data movement (MirrorMaker 2 / Cluster Linking) carry the heaviest weight, while understanding of the surrounding tooling (Control Center, Cruise Control, the CLI suite, JMX/monitoring) is what separates strong from average candidates.

  • Study prioritization: architecture/security/monitoring/troubleshooting → high, memorizing tool details → medium, breadth of the surrounding ecosystem → low to medium
  • Hedging against unpublished weights: go deep on the places production incidents tend to happen (min.insync.replicas, acks=all, quotas, under-replicated partitions, authentication failures)
  • Question-format prep: be able to instantly compare options on the basis of setting names, their effects, trade-offs, and the direction of the defaults

Architecture Essentials: Replication, ISR, Rack Awareness, and the Controller

Kafka's availability design is defined by the combination of a topic's replication.factor, min.insync.replicas, and the producer's acks setting. In both production and on the exam, replication.factor=3, min.insync.replicas=2, and acks=all is the baseline answer. When the ISR shrinks below min.insync.replicas, producers can fail with NOT_ENOUGH_REPLICAS.

Setting rack awareness (broker.rack) and spreading replicas across multiple racks via the topic assignment improves RPO/RTO when a rack fails. The controller handles leader election and metadata management; operationally, controller stability (metadata propagation lag and the health of ZooKeeper/KRaft) is also a monitoring target. Because the ZooKeeper-based and KRaft (self-managed metadata) modes coexist for now, be ready to answer questions about either.

  • Rule of thumb: avoiding data loss = acks=all AND min.insync.replicas >= 2 AND replication.factor >= min.insync.replicas + 1
  • UnderReplicatedPartitions > 0 should alert immediately. Investigate the conditions for ISR recovery (replica.lag, network/disk I/O).
  • unclean.leader.election.enable=false is the safe default. Setting it true in emergencies risks data inconsistency.
  • Rack sensitivity: set broker.rack and verify that the assignment policy actually distributes replicas across racks.

Logical Kafka cluster layout and replication

ProducersController/MetadataBroker 1 (rack A)P0(L) P1(F) P2(R)Broker 2 (rack B)P0(R) P1(L) P2(F)Broker 3 (rack C)P0(F) P1(R) P2(L)ConsumersLegend: L=Leader, F=Follower, R=Replica

Topic Configuration and Quotas: Balancing Throughput and Durability

Topic design comes down to understanding the trade-offs between retention strategy (retention.ms / retention.bytes / cleanup.policy), segment sizing (segment.bytes / segment.ms), and compaction thresholds (min.cleanable.dirty.ratio). Log compaction keeps the latest value per key while saving space, but consider the impact on read patterns and latency.

Quotas (per producer, consumer, client-id, or user) are the standard defense against noisy neighbors. Apply controls on throughput (bytes), request rate, and connection count, and correlate throttling events with monitoring when they fire.

  • Retention strategy: cleanup.policy=delete, compact, or compact,delete
  • Durability: lock down the combination of min.insync.replicas and acks at design time
  • Throughput: do not make segment.bytes and the flush settings too small (it causes excessive I/O)
  • Quota types: producer_byte_rate, consumer_byte_rate, request_percentage, controller_mutation_rate, and so on

Representative topic creation and quota configuration (Apache Kafka CLI)

kafka-topics.sh --bootstrap-server broker:9092 \
  --create --topic orders \
  --partitions 12 --replication-factor 3 \
  --config min.insync.replicas=2 \
  --config cleanup.policy=compact,delete \
  --config retention.ms=259200000 \
  --config segment.bytes=1073741824

# Quota: cap produce throughput per client ID (1 MB/s)
kafka-configs.sh --bootstrap-server broker:9092 \
  --alter --add-config 'producer_byte_rate=1048576' \
  --entity-type clients --entity-name app-producer

# Cap consume throughput per user (e.g., SASL/SCRAM username) (2 MB/s)
kafka-configs.sh --bootstrap-server broker:9092 \
  --alter --add-config 'consumer_byte_rate=2097152' \
  --entity-type users --entity-name alice

Security Essentials: TLS/SASL, ACL, and RBAC

Kafka traffic is encrypted with TLS and authenticated via SASL or mTLS. SASL mechanisms include PLAIN, SCRAM, OAUTHBEARER, and GSSAPI. On the broker side, you must align listeners, listener.name.<...>.ssl.*, sasl.enabled.mechanisms, and sasl.mechanism.inter.broker.protocol consistently.

Authorization is built on ACLs. You combine resource type (Topic, Group, Cluster, TransactionalId), pattern type (LITERAL/PREFIXED), and operation (READ/WRITE/DESCRIBE/IDEMPOTENT_WRITE, etc.), and grant least privilege. Confluent Platform and Cloud also support consolidating this into role-based assignment via RBAC, which improves operational consistency.

  • With mTLS, certificate management is the crux: make the CA trust relationship explicit on both server and client.
  • With SASL/SCRAM, standardize user creation and secret protection (automation and rotation).
  • For ACLs, define principal naming and when to use PREFIXED. Watch for missing READ permissions on consumer groups.
  • RBAC (Confluent) makes central management and auditing easy. Understand the evaluation order and scope when it coexists with ACLs.

Monitoring and Operations: Key Metrics, Rolling Upgrades, and Comparing Data-Movement Tools

For monitoring, use JMX, logs, and Control Center together, watching latency, throttling, under-replicated partitions, disk usage, and network idle from multiple angles. Translate anomaly detection into alert design that lets you short-circuit from symptom to cause to remediation.

MirrorMaker 2 (MM2) and Cluster Linking take fundamentally different approaches to data movement and replication. The exam tends to ask about the differences in characteristics, RPO/RTO, dependencies, and how ordering is handled.

  • Key JMX example: kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions
  • Queues and latency: RequestQueueTimeMs, Fetch/Produce purgatory size
  • Network and threads: NetworkProcessorAvgIdlePercent
  • Traffic: BytesInPerSec / BytesOutPerSec, and detecting saturation of replication bandwidth
  • Rolling upgrades: check the compatibility matrix and inter-version support, and follow the prescribed controller/broker order
AspectMirrorMaker 2Cluster LinkingWhat to nail down for the exam
Approach / dependenciesBuilt on Connect; both source and target need ConnectNative broker-to-broker link (Confluent feature)Deployment requirements and whether extra components are needed
Granularity / targetPer-topic replication; bidirectional setups are possibleTopics are followed across clusters (read-only follower)Which option supports which configuration (bidirectional, handling of existing topics)
Ordering / latencyPer-partition ordering is preserved; extra latency comes from going through the ConnectorLinks lean toward lower latencyAdvantages in RPO/RTO and latency estimation
Operations / managementRequires monitoring and rebalancing ConnectorsEasy to manage centrally through Control CenterDifferent things to monitor (tasks vs. links)
Use casesFlexible for hybrid setups and bridging version differencesSimplifies DR and cross-region replicationChoosing between them in DR design

Common Troubleshooting Patterns and CLI Procedures

Produce failures reproduce when acks=all and min.insync.replicas are mismatched, when the ISR shrinks, or when TLS between brokers is misaligned. Consume lag gets worse with excessive group rebalances, throttling, and stale consumer settings (max.poll.interval.ms, fetch.min.bytes).

The priority response is: scope the blast radius (specific topic / partition / rack), isolate the limiting factor (network, disk, threads, quota), and verify first-order hypotheses on both configuration and code (acks, batching, compression, GC / page cache).

  • UnderReplicatedPartitions > 0: inspect network latency, disk bandwidth, and replica.fetch.wait.max.ms. Trigger a manual leader election for the replica if needed.
  • Throttling: check the broker logs for ThrottledRequests and QuotaViolation, then re-tune the quotas.
  • Authentication failures: check for mismatched SASL/TLS mechanisms between server and client, and certificate CN/SAN mismatches.
  • Growing lag: correlate the consumer group's commit interval, max.in.flight.requests, and fetch-related settings with broker bandwidth.

Diagnostic commands you can use immediately in the field

# Check lag (per consumer group)
kafka-consumer-groups.sh --bootstrap-server broker:9092 \
  --group app-group --describe

# Check under-replicated partitions and leader state (use alongside metrics)
# JMX omitted; sampling from logs
grep -E 'UnderReplicatedPartitions|Throttled|Quota' /var/log/kafka/server.log | tail -n 50

# Topic details (retention / ISR / assignment)
kafka-topics.sh --bootstrap-server broker:9092 \
  --describe --topic orders

# Cluster ACLs
kafka-acls.sh --bootstrap-server broker:9092 --list

# Quick connectivity test (also useful to isolate TLS/SASL issues)
kcat -b broker:9093 -X security.protocol=SASL_SSL \
  -X sasl.mechanisms=SCRAM-SHA-512 \
  -X sasl.username=alice -X sasl.password=secret \
  -t orders -P -l <<< 'smoke-test'

# Safer leader election when needed (during planned maintenance)
# kafka-leader-election.sh --bootstrap-server broker:9092 --election-type PREFERRED --topic orders

Check Your Understanding

CCAAK

問題 1

You want producer writes to survive a single-broker failure without data loss. Which combination is appropriate?

  1. replication.factor=2, min.insync.replicas=1, producer acks=1
  2. replication.factor=3, min.insync.replicas=1, producer acks=all
  3. replication.factor=3, min.insync.replicas=2, producer acks=all
  4. replication.factor=2, min.insync.replicas=2, producer acks=0

正解: C

The textbook recipe for avoiding data loss is to keep replication.factor high enough that a majority of replicas exists, set min.insync.replicas >= 2, and have the producer wait for commit with acks=all. Option C satisfies this. A, B, and D all leave room for either data loss on failure or unacknowledged writes.

Frequently Asked Questions

Will the exam ask about ZooKeeper or KRaft?

Because the ecosystem is mid-migration, it is safest to be ready for either term or concept to appear. Focus on the controller's role, metadata availability, and how rolling upgrade procedures differ — pinning down what the two share and where they diverge.

Do I need to know Confluent Cloud?

The foundation is general Kafka operations, but Confluent-specific features (RBAC, Control Center, Cluster Linking, etc.) can show up. Rather than memorizing Cloud-only features, understand what each feature actually does and the operational benefit it provides.

Are the exact weights and passing score published?

Detailed scoring and per-domain weighting are not officially published. Check the official certification page for the current exam outline (question count, time limit, scope) and concentrate on architecture, security, monitoring, and troubleshooting.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
Kafka

Kafka Topics & Partitions: Distribution Fundamentals (2026)

How Kafka topics and partitions enable scale — ordering guar...

Kafka

CCDAK Exam Guide: Confluent Certified Developer (2026)

Complete prep for the CCDAK exam — Producer/Consumer API, St...

Kafka

CCAAK Exam Guide: Confluent Certified Administrator (2026)

Pass the CCAAK exam — cluster management, partitions, securi...

Kafka

Kafka Replicas & ISR: Fault Tolerance Explained (2026)

Replica placement, in-sync replicas (ISR), leader election. ...

Kafka

Kafka Offsets: Commit Modes & Consumer Position (2026)

Offset semantics — auto vs. manual commit, __consumer_offset...

Browse all Kafka articles (101)
© 2026 NicheeLab All rights reserved.