Kafka

Kafka Certifications Overview: Key Points of CCDAK, CCAAK, and Confluent Cloud

2026-04-19
NicheeLab Editorial Team

Kafka certifications split broadly into the application-development-oriented CCDAK and the operations-oriented CCAAK. On top of that, design and operations perspectives from the managed Confluent Cloud environment reinforce understanding of both exams.

This article organizes the exam domains by tying them back to real-world decisions. It sticks to stable features and packages takeaways at the level of CLI commands, configuration values, and design principles.

Positioning the Certifications and the Comparison Axes to Lock In First

CCDAK asks whether you can correctly move, process, and manage schemas for data using Kafka. CCAAK asks whether you understand the configuration needed to design, secure, and operate clusters while meeting SLAs. Confluent Cloud demands the ability to project that knowledge onto a managed environment.

The foundation shared by both exams covers partitioning and replication, delivery guarantees (at-least / at-most / exactly-once), consumer groups, schema compatibility, and security (TLS / SASL / ACL). On Confluent Cloud, those same concepts gain management elements like RBAC, networking (private connectivity), and quotas.

  • Availability fundamentals: replication factor and ISR, plus the relationship between min.insync.replicas and acks=all, show up frequently
  • Ordering is guaranteed at the partition level. Learn key design and partitioner behavior as a single unit
  • Make schema compatibility mode an operational rule (defaulting to backward compatibility is the safe choice)
  • On Cloud, organize your understanding of RBAC, API keys / service accounts, and network connectivity options (public / peering / Private Link)
ItemCCDAK (Development)CCAAK (Operations)
Target roleApplication / data engineerSRE / platform engineer
Main topicsProducer / Consumer / Streams, Schema Registry, delivery guaranteesBroker configuration, replication and disaster recovery, security, monitoring and tuning
EnvironmentClient perspective across both self-hosted and Confluent CloudCentered on self-hosted clusters, plus Cloud operations perspective (RBAC / networking)
Key concepts / settingsacks, idempotence, transactional.id, offset management, compatibility modesmin.insync.replicas, quotas, retention / compaction, SASL/TLS, ACL/RBAC
Common points of confusionExactly-once needs to be designed across the entire processing pipelineacks=all is only part of the availability condition; it must be paired with ISR

Kafka's basic data flow (the unit of granularity tested on the exam)

Producerskey → partitionBroker 1P0 leader / P1 replicaBroker 2P0 replica / P1 leaderConsumer Group Aread / commit offsetConsumer Group Bindependent lagTopic: my-topic / Partitions P0,P1 / RF=2 / Ordering per partition / Commits per group

Topic creation and availability basics (acks and min.insync.replicas)

kafka-topics.sh --create --topic orders --partitions 6 --replication-factor 3 --bootstrap-server broker:9092
# 書き込み側(プロデューサ)
acks=all
enable.idempotence=true
retries=2147483647
max.in.flight.requests.per.connection=5
# ブローカー/トピック側(ISR維持を前提にデータ損失を抑える)
min.insync.replicas=2

Delivery Guarantees and Offset Management: Designing to Avoid the Pitfalls

At-least-once is easy to implement by default but requires a design that tolerates duplicate processing (idempotent sinks or deduplication). At-most-once is achieved by committing before processing, but allows the possibility of loss. Exactly-once hinges on a pipeline design that combines producer idempotence and transactions with consistency on the sink side.

Consumer offsets are held in Kafka per group. Auto-commit is simple but you have to watch processing and commit timing right after a rebalance. When high reliability is required, prefer manual commit with explicit ordering of poll, process, and commit.

  • Ordering guarantees assume "same key to same partition." Without keys, depending on the version, the sticky partitioner is used and tends to send to the same partition in bursts over short windows
  • Producer idempotence alone is not enough to prevent duplicates. Combine it with upserts or unique key constraints on the sink side
  • Pause time during rebalances depends on the assignment strategy and processing time. Split long-running processing into batches or offload it to a DLQ

Minimum producer EOS and consumer manual-commit setup

# Producer (Java properties)
enable.idempotence=true
acks=all
transactional.id=orders-tx-01
# Consumer (Java pseudo flow)
while (running) {
  ConsumerRecords<K,V> rs = c.poll(Duration.ofSeconds(1));
  for (r: rs) { process(r); }
  c.commitSync();
}
# at-most-onceの例(推奨しない):commitSync() を process() の前に呼ぶ

Schema Management and Compatibility: Make Backward Compatibility the Operational Rule

Schema Registry controls schema evolution through compatibility modes. The common operational default is backward compatibility (BACKWARD or BACKWARD_TRANSITIVE), which keeps existing consumers from breaking. If you need bidirectional evolution, consider the FULL family, but expect higher operational overhead.

Subject naming strategies (TopicNameStrategy, RecordNameStrategy, etc.) directly drive multi-event topic design. Use RecordNameStrategy when you want to allow multiple types within a single topic; TopicNameStrategy is simpler when each topic carries one event type.

  • Under backward compatibility, you can add fields with default values, but deletion is discouraged
  • Choose auto-registration by balancing safety and auditing (for example, auto in staging only, manual in production)
  • Apply Serdes configuration to both producer and consumer so compatibility violations are caught early

Configuring compatibility mode and the basics of schema registration

# 後方互換をグローバルに設定
curl -s -X PUT -H 'Content-Type: application/json' \
  --data '{"compatibility": "BACKWARD"}' \
  http://schema-registry:8081/config

# サブジェクト単位で上書き(orders-value に FULL_TRANSITIVE)
curl -s -X PUT -H 'Content-Type: application/json' \
  --data '{"compatibility": "FULL_TRANSITIVE"}' \
  http://schema-registry:8081/config/orders-value

# 例: Avroスキーマ登録(値側)
curl -s -X POST -H 'Content-Type: application/vnd.schemaregistry.v1+json' \
  --data '{"schema": "{\"type\":\"record\",\"name\":\"Order\",\"fields\":[{\"name\":\"id\",\"type\":\"string\"}]}"}' \
  http://schema-registry:8081/subjects/orders-value/versions

Stream Processing: Design Decisions for Kafka Streams and ksqlDB

Kafka Streams is a library you embed in your application, giving fine-grained control over state management (State Store), repartitioning, and processing guarantees (at-least / exactly_once_v2). ksqlDB lets you write stream and table transformations declaratively and runs them continuously as persistent queries.

Repartitioning occurs whenever an operation changes keys, and internal topics are created for it. At scale, do not forget the partition count, retention, and monitoring for these internal topics.

  • processing.guarantee=exactly_once_v2 fits well when both source and sink are Kafka
  • State Stores are made fault-tolerant via changelog topics. Plan local storage sizing and cleanup
  • Design partition counts for ksqlDB persistent queries up front, in line with your scale-out plans

Minimum Kafka Streams topology (skeleton for an aggregation)

Properties p = new Properties();
p.put(StreamsConfig.APPLICATION_ID_CONFIG, "orders-agg");
p.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "broker:9092");
p.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, "exactly_once_v2");
StreamsBuilder b = new StreamsBuilder();
KStream<String, Order> s = b.stream("orders");
KTable<String, Long> t = s.groupByKey().count();
t.toStream().to("orders-count");
KafkaStreams app = new KafkaStreams(b.build(), p);
app.start();

Confluent Cloud Perspective: How Responsibilities and Settings Change Under Managed Services

On Confluent Cloud, broker placement and patching become the service's responsibility, while the user focuses on logical design (topic count and partition design), access control (RBAC / ACL), and connectivity (internet vs. private connectivity).

Monitoring uses built-in metrics to visualize latency, throughput, and consumer lag. Quotas and limits (API calls, partition counts, and so on) depend on your plan, so verify them before designing.

  • Authentication uses API key plus secret. Issue them per service account and apply RBAC to enforce least privilege
  • Choose your network connectivity from public endpoint, VPC Peering, Private Link, and similar options
  • Dedicated clusters offer predictable isolation and throughput. Basic / Standard prioritize ease of use

Minimum ccloud CLI flow (environment → cluster → topic → ACL)

ccloud environment create dev
ccloud environment use <env-id>
ccloud kafka cluster create dev-cluster --cloud aws --region ap-northeast-1 --type standard
ccloud kafka cluster use <lkc-id>
ccloud api-key create --resource <lkc-id>
ccloud kafka topic create orders --partitions 6
# RBAC/ACL例(クライアントに書き込み、コンシューマグループに読み取り)
ccloud kafka acl create --allow --service-account sa-123 --operation WRITE --topic orders
ccloud kafka acl create --allow --service-account sa-123 --operation READ --topic orders
ccloud kafka acl create --allow --service-account sa-123 --operation READ --group orders-app

Study Roadmap: Passing the Exam and Applying It at Work, Together

First, complete one local cycle of "build → produce → consume → break → fix." Next, layer in Schema Registry and Streams/ksqlDB, and finally repeat the same flow on Confluent Cloud, articulating the differences in words. At every step, aim for the ability to explain each configuration value with a reason.

For mock practice, write at least one of your own questions each on availability, delivery guarantees, schema evolution, and security. Being able to explain the design reasoning in 30 seconds is a good benchmark for passing readiness.

  • Day 1: Producer / Consumer on a local single broker. Experience manual offset commits
  • Day 2: Multiple brokers plus replication. Verify the behavior of acks and min.insync.replicas
  • Day 3: Try break-and-repair scenarios with Schema Registry and backward compatibility
  • Day 4: Aggregation and repartitioning with Streams / ksqlDB. Confirm monitoring of internal topics
  • Day 5: Repeat the same flow on Confluent Cloud. Audit RBAC, connectivity, and quotas

Minimum Docker Compose (skeleton for study)

services:
  broker:
    image: confluentinc/cp-kafka:7.6.1
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker:9092
    ports: ["9092:9092"]
  zookeeper:
    image: confluentinc/cp-zookeeper:7.6.1
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
  schema-registry:
    image: confluentinc/cp-schema-registry:7.6.1
    environment:
      SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: PLAINTEXT://broker:9092
      SCHEMA_REGISTRY_HOST_NAME: schema-registry
    ports: ["8081:8081"]

Check Your Understanding with a Question

CCDAK / CCAAK

問題 1

You want to prioritize high availability while avoiding data loss on writes. For a topic with replication.factor=3 and min.insync.replicas=2, which producer configuration is the most appropriate?

  1. acks=all, enable.idempotence=true
  2. acks=1, enable.idempotence=true
  3. acks=0, retries=0
  4. acks=all, enable.idempotence=false

正解: A

acks=all waits for commit to all replicas in the ISR, so combined with min.insync.replicas=2, it suppresses loss when a replica fails. enable.idempotence=true also prevents double application from duplicate writes. acks=1 or 0 acknowledges from only the leader or just on send, which still leaves room for loss during failures.

Frequently Asked Questions

Should I take CCDAK or CCAAK first?

Start with CCDAK if your work centers on application development or data pipeline implementation. Start with CCAAK if you focus on platform operations, security, and SLAs. If you plan to earn both, building a solid sense of correct data flow design through CCDAK first makes the configuration rationale behind CCAAK much easier to internalize.

Can I pass by studying only with Confluent Cloud?

Core concepts are the same whether managed or self-hosted, but CCAAK tests broker configuration and low-level operations, so you also need hands-on time with a local or self-hosted environment. Treat Confluent Cloud's RBAC and networking as a delta to study on top.

How deeply do I need to learn exactly-once?

If you can articulate these four points, you are covered for both the exam and real work: producer idempotence and transactions, Streams' processing.guarantee=exactly_once_v2, why it shines when both source and sink are Kafka, and the fact that external sinks ultimately require a two-phase consistency design.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
Kafka

Kafka Topics & Partitions: Distribution Fundamentals (2026)

How Kafka topics and partitions enable scale — ordering guar...

Kafka

CCDAK Exam Guide: Confluent Certified Developer (2026)

Complete prep for the CCDAK exam — Producer/Consumer API, St...

Kafka

CCAAK Exam Guide: Confluent Certified Administrator (2026)

Pass the CCAAK exam — cluster management, partitions, securi...

Kafka

Kafka Replicas & ISR: Fault Tolerance Explained (2026)

Replica placement, in-sync replicas (ISR), leader election. ...

Kafka

Kafka Offsets: Commit Modes & Consumer Position (2026)

Offset semantics — auto vs. manual commit, __consumer_offset...

Browse all Kafka articles (101)
© 2026 NicheeLab All rights reserved.