Event Sourcing is a design approach where you do not store an entity's current state directly. Instead, the stream of events that produced that state is the single source of truth. Kafka excels at appending, distributing, and replaying events, and it ships with the features Event Sourcing needs: per-partition ordering, flexible retention strategies, and transactions.
This article walks through the design considerations along the Confluent Certified Developer for Apache Kafka (CCDAK) exam domains, with a practical lens. We stick to stable concepts and features and assume the behavior documented in the official Kafka documentation.
In Event Sourcing, events are the single source of truth and state is treated as a derivable artifact that can be reconstructed at any time. Kafka's append-only topic log, in-partition ordering guarantees, and retention and compaction policies let you keep events long-term and replay them as needed.
Topics CCDAK loves to ask about include the difference between retention (delete) and log compaction (compact), in-partition ordering and key design, the meaning of transactional and idempotent producers, and the consumer isolation.level setting.
| Aspect | Event Sourcing + Kafka | CRUD-centric (state-based) | CDC (log-based change capture) |
|---|---|---|---|
| Source of truth | Event log (append-only) | Current state (overwritten) | Database is primary; changes captured after the fact |
| Replayability | Full-history rebuild by design | Difficult; requires a separate audit trail | Possible to a degree, depending on retention |
| Ordering | Strict within a key (within a partition) | Depends on the application or database | DB commit order may not match business order |
| Schema evolution | Managed via Schema Registry compatibility rules | Driven by database schema migrations | Constrained by the CDC tool's compatibility |
| Audit and traceability | Events double as the audit log | Needs additional design work | Usable as a change history |
Ordering is guaranteed only within a partition. The default move is to pick a business aggregate (such as accountId or orderId) as the key. Partition count drives throughput and parallelism, so size it with the cardinality and distribution of your keys in mind.
Retention (delete) prunes the log by time or size. Compaction (compact) keeps only the latest record per key and removes prior records for that key. Using both together (cleanup.policy=compact,delete) preserves the latest view while still letting you sweep out extremely old segments.
Kafka's idempotent producer prevents duplicate writes even when the network retries. Set transactional.id on top of that, and you can write to multiple topics and partitions atomically; consumers running with isolation.level=read_committed will then read only committed records. Together, these give you exactly-once semantics for processing that lives entirely inside Kafka.
Strict end-to-end EOS that extends to external systems such as databases sits outside Kafka's guarantees. The pragmatic approach for external integrations is to lean on patterns like the Outbox or a two-step pipeline and settle for at-least-once delivery plus idempotent processing.
Minimal Transactional Producer configuration (Java)
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker:9092");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, org.apache.kafka.common.serialization.StringSerializer.class.getName());
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, org.apache.kafka.common.serialization.ByteArraySerializer.class.getName());
props.put(ProducerConfig.ENABLE_IDEMPOTENCE_CONFIG, "true");
props.put(ProducerConfig.ACKS_CONFIG, "all");
props.put(ProducerConfig.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION, "5"); // 再順序化を避ける上限
props.put(ProducerConfig.RETRIES_CONFIG, Integer.toString(Integer.MAX_VALUE));
props.put(ProducerConfig.TRANSACTIONAL_ID_CONFIG, "orders-tx-1");
KafkaProducer<String, byte[]> producer = new KafkaProducer<>(props);
producer.initTransactions();
try {
producer.beginTransaction();
// バッチでイベントを書き込む
for (Event e : events) {
ProducerRecord<String, byte[]> rec = new ProducerRecord<>("orders-events", e.key(), e.payload());
producer.send(rec);
}
producer.commitTransaction();
} catch (org.apache.kafka.common.errors.ProducerFencedException fenced) {
// 同一 transactional.id の別インスタンス起動などでフェンスされた
producer.close();
throw fenced;
} catch (Exception ex) {
producer.abortTransaction();
}Because events are replayed over long horizons, you have to manage schema evolution with backward and forward compatibility in mind. Use Confluent Schema Registry and set the right compatibility level. Adding fields is usually backward-compatible, while removing required fields or changing types tends to break compatibility.
Your subject naming strategy (TopicNameStrategy, RecordNameStrategy, and so on) changes the granularity and sharing of schemas. For Event Sourcing, keep a clean schema per event type and have the application side carry an upcaster — a routine that lifts old events into the latest shape — as a safety net.
Ingest events as a KStream, aggregate by key, and materialize them into a KTable to get a local or remote store of the latest state. Compacted topics are a natural fit for KTables, and you can represent deletes with tombstones.
When you need a replay, reset the consumer offset to the beginning (for example, with auto.offset.reset=earliest or a manual seek) or read with a fresh consumer group. To shorten replay time, it is also effective to periodically publish a state-snapshot topic so you can resume from a snapshot.
Event Sourcing data flow (conceptual diagram)
In production, watch latency and lag, retention size, compaction progress, tombstone backlog, schema compatibility violations, and the impact of rebalances. Because Event Sourcing assumes data cannot simply be deleted, cost estimation and an archive or snapshot strategy are critical.
For exam prep, expect frequent questions on retention vs. compaction, keys and ordering, idempotent and transactional producer settings, read_committed, the difference between KTable and KStream, schema compatibility levels, and the basics of ACLs and security.
CCDAK
問題 1
You manage user account balances with Event Sourcing. You want to keep the latest state on Kafka without an external database, while still being able to rebuild from the full history when needed. Which design is the best fit?
正解: A
To keep the latest view inside Kafka while still being able to replay history, the canonical answer is to keep per-key ordering and materialize a KTable with compaction. B keeps too little history to replay; C does not scale and throws away key-based ordering; D invites breaking changes over a long horizon and is the wrong answer on the exam as well.
What is the difference between Event Sourcing and CDC, and when should you use each?
With Event Sourcing, events are the single source of truth and state is derived by replay. Events sit at the center of the design, and the write path itself is event-driven. CDC, by contrast, captures changes from an existing database after the fact, which makes it well suited to legacy integration and phased migrations. Choose Event Sourcing for greenfield designs that prioritize history, audit, and replay; choose CDC when you need to stream data out of an existing database.
What do you do if a full replay is needed but some history has been deleted past the retention window?
The safest approach is to combine the latest-state view of a compacted topic with periodic snapshots. If data has already been removed, restore it from an archive (a mirror cluster or external storage), or replay only the delta from the latest snapshot. Going forward, revisit your retention and storage plan and use compaction and snapshots together.
How do you guarantee ordering within a single aggregate, and what about ordering across aggregates?
Set the aggregate key (e.g. orderId) as the message key so that records for the same key always land in the same partition. Kafka only guarantees ordering within a partition. Ordering across different keys or across aggregates is not guaranteed, so model causality explicitly in the event schema, or handle it through a separate workflow topic or an orchestrator.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...