The Saga pattern is the standard way to maintain business consistency without relying on 2PC, but design and implementation that holds up in production demands a solid grasp of Kafka's official features.
Aligned with the CCDAK (Confluent Certified Developer for Apache Kafka) exam scope, this article summarizes Kafka's transactional API, Exactly-Once Semantics, topic/key design, and compensation design from a hands-on engineering perspective.
Saga achieves eventual consistency by combining a "series of local transactions" with "compensating actions on failure." There are two control styles: orchestration (a central coordinator directs the flow) and choreography (each service subscribes to events and autonomously triggers the next step). Kafka excels at durability, ordering, replay, and scaling — supporting robust implementations of either style.
From the CCDAK perspective, frequently tested topics include topic partitioning and ordering guarantees by key, producer idempotence and transactions, consumer sendOffsetsToTransaction, and Kafka Streams' exactly_once_v2. Baking these into the design from the start simplifies failure reprocessing and auditing.
| Approach | Consistency / Availability | Implementation / Operational Notes |
|---|---|---|
| Saga - Orchestration (Kafka) | Eventual consistency. High observability and control. | Orchestrator scaling/redundancy, centralized timeout and compensation management, separation of commands and events. |
| Saga - Choreography (Kafka) | Eventual consistency. Loosely coupled and easy to extend. | Event/schema evolution management, prevention of cycles and duplicate triggers, compensation implemented per service. |
| 2PC / Distributed TX | Strong consistency, but availability and latency tend to suffer. | Risks of coordinator failure and blocking. Poor fit for microservices. |
| No compensation (anti-pattern) | Inconsistency remains when failures occur. | Cannot guarantee business consistency. Do not adopt. |
Example of Saga on Kafka (choreography)
The basis of Saga is to record "events of facts" and decide the next action from them. In Kafka, separate events and commands logically, and use keys to secure ordering and locality. Fix the key to the same business entity (e.g., orderId) to gain ordering within a single partition.
Default to backward compatibility for schemas, and avoid breaking changes. Using Schema Registry makes schema evolution measurable, and aligns with what CCDAK tests (schema compatibility modes). Topics that also use log compaction (compact) are well-suited to snapshot semantics, and can be applied to project Saga state.
Use the Outbox pattern to guarantee consistency between in-service DB updates and Kafka sends. The app commits the business row and the Outbox row in the same local transaction, and a component polling the Outbox (or CDC/Connect) delivers them to Kafka. This eliminates the two-phase problem between the DB and Kafka.
Enable idempotence on the Kafka producer, and use transactions where needed. With a transaction, you can atomically group writes across multiple topics/partitions with consumer offset commits. When you set exactly_once_v2 on Kafka Streams, it manages transactions internally and suppresses duplicates and losses across input processing and output.
Saga assumes failures will happen. Set timeouts on each step, and record failures and expirations as factual events. Design compensation as business-reversible actions and make them idempotent (running the same compensation any number of times must be safe).
Make retries operable with exponential backoff plus a dead-letter queue (DLQ). When ordering matters, design for sequential processing within the same key, while ensuring an error-induced stop does not block the entire system.
In the orchestrator style, you keep the Saga state machine in a state store, and when you receive a result event for each step, you emit the next command. With Kafka Streams, you can manage routing between topics and state transitions within a single topology, and exactly_once_v2 suppresses duplicates and partial failures.
Below is a minimal example that takes an order event and sequentially progresses through payment → inventory → shipping. Production deployments add timeout management, compensation transitions, audit logs, DLQ, and more.
A minimal orchestrator with Kafka Streams (Java)
Properties p = new Properties();
p.put(StreamsConfig.APPLICATION_ID_CONFIG, "saga-orchestrator");
p.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "broker:9092");
p.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE_V2);
StreamsBuilder b = new StreamsBuilder();
StoreBuilder<KeyValueStore<String, String>> store = Stores.keyValueStoreBuilder(
Stores.persistentKeyValueStore("saga-store"),
Serdes.String(), Serdes.String());
b.addStateStore(store);
KStream<String, OrderEvent> orders = b.stream("order-events",
Consumed.with(Serdes.String(), orderEventSerde()));
KStream<String, Command> cmds = orders.transformValues(() -> new SagaOrchestrator("saga-store"), "saga-store")
.flatMapValues((SagaStepResult r) -> r.outgoingCommands());
cmds.split()
.branch((k, c) -> c.type() == CommandType.AUTH_PAYMENT,
Named.as("payment"))
.to("payment-commands", Produced.with(Serdes.String(), commandSerde()));
cmds.split()
.branch((k, c) -> c.type() == CommandType.RESERVE_STOCK,
Named.as("inventory"))
.to("inventory-commands", Produced.with(Serdes.String(), commandSerde()));
// 支払い結果などのイベントを別ストリームで受け、状態を進める
KStream<String, PaymentEvent> pay = b.stream("payment-events",
Consumed.with(Serdes.String(), paymentEventSerde()));
pay.process(() -> new Processor<String, PaymentEvent>() {
private KeyValueStore<String, String> kv;
@Override public void init(ProcessorContext ctx) {
kv = (KeyValueStore<String, String>) ctx.getStateStore("saga-store");
}
@Override public void process(String key, PaymentEvent ev) {
// 状態更新と次コマンド作成(省略)
}
}, "saga-store");
KafkaStreams s = new KafkaStreams(b.build(), p);
s.start();
// 注意: Streams は内部でトランザクションを管理し、EOSv2 を実現するSaga succeeds or fails based on "whether you can observe it." Leave events for the start, success, failure, and compensation of each step, and make them traceable via correlation IDs. Continuously monitor metrics: consumer latency, retry rates, DLQ counts, and transaction abort rates.
For CCDAK prep, lock in transaction boundaries, offset-commit consistency, EOS prerequisites, ordering guarantees with keys and partitions, compaction and retention policies, and schema compatibility.
CCDAK
問題 1
In a Kafka-based Saga implementation, you want to process input events, write to multiple topics, and atomically commit offsets in the same processing unit. Which is the correct implementation?
正解: A
Kafka transactions can commit writes across multiple topics/partitions and consumer offsets within a single atomic boundary. The correct order is beginTransaction → send records → sendOffsetsToTransaction (along with the consumer group ID) → commitTransaction. auto.commit alone does not guarantee atomicity, and sequential calls can still produce inconsistencies on failure.
Are Kafka transactions a replacement for distributed transactions (2PC)?
Kafka transactions provide atomicity for writes across multiple Kafka partitions/topics and consumer offset commits. They do not bundle an external database and Kafka into a single distributed transaction. For external systems, combine patterns such as Outbox or CDC and rely on business-level compensation to achieve eventual consistency.
Does Exactly-Once Semantics (EOS) really guarantee "once and only once"?
Kafka Streams' exactly_once_v2 and producer transactions provide consistency between processing, output, and offset commits within the Kafka boundary. However, if the sink (external DB/API) is not idempotent, you must prevent duplicates on the outside. EOS suppresses duplicates and losses inside Kafka — it does not unconditionally guarantee "absolutely once" across the entire system.
Compensation logic is complex and hard to design. Where should I start?
First, enumerate business invariants and define the "smallest reversible action" when each step is broken. Treat compensation like the normal flow — turn it into events, make it observable, and make it safe to re-run with an idempotency key. Clarifying timeouts, maximum retry counts, and setting up a DLQ path for manual intervention will move the design forward.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...