Transactional Outbox Pattern with Kafka (2026)

Writing to an RDB from your application and then publishing the same change to Kafka is a dual write: if either side fails, consistency breaks. Kafka does not support distributed transactions (XA) with external databases, so you need to design the dual write away from the start.

The Outbox pattern persists business data and the event in a single RDB transaction, then reliably delivers the event to Kafka afterwards. It is also a recurring topic on CCDAK, so we walk through the fundamentals from a practical, operational angle.

Why You Need the Outbox Pattern: The Dual-Write Problem and Consistency

If the application updates the RDB and sends to Kafka separately, partial failures or retries lead to inconsistency (DB succeeded but Kafka did not, or vice versa). Kafka does not run two-phase commit with external systems (per the official spec), so the application alone cannot make both sides succeed or fail atomically.

In the Outbox pattern, the business table update and the insert into the outbox table are committed in the same RDB transaction. That guarantees the existence of the event atomically on the DB side. Delivery to Kafka is handled by a downstream CDC (Change Data Capture) job or poller, with at-least-once delivery and idempotent consumers absorbing any duplicates.

Kafka does not provide distributed transactions with external DBs, so atomicity must live on the DB side
Record the event as a row in the Outbox table; a CDC job or poller asynchronously ships it to Kafka
At-least-once is the baseline. Absorb duplicates with key design, compaction, and idempotent consumers

Outbox Table Design: Minimum Columns and Metadata

The Outbox is a chronological log of update facts. It is append-only by default, and rows are written inside the application's transaction alongside the business tables. Typical columns are listed below.

id (event ID, UUID recommended), aggregate_type (e.g. Order), aggregate_id (e.g. order_id), event_type (e.g. OrderCreated), payload (JSON / Avro binary, etc.), headers (optional: schemaVersion, messageId, etc.), occurred_at / created_at (monotonically increasing timestamps for ordering), and partition_hint (the value used as the Kafka key). A processed flag is generally unnecessary because the CDC job or poller tracks its own consumption position.

Key design: use aggregate_id as the Kafka key so each aggregate's events stay ordered within a partition
Ordering signal: rely on commit order, supplemented by occurred_at or a sequence number
Serialization: with a Schema Registry, store the schema-ID reference in payload or encode on the CDC side
Cleanup: partition the table or delete by retention window (after CDC). Be careful with physical deletes that bypass CDC
Keep triggers and update logic to a minimum; sticking to plain INSERTs reduces side effects

Implementation Options: CDC, Poller, and Direct Publish Compared

There are roughly three delivery paths. In practice, CDC + Outbox routing (Kafka Connect) is the easiest to operate and tends to lead to a calm production setup. An in-app poller gives you fine-grained control, but you end up implementing deduplication and concurrency control yourself. Direct dual-writes to DB and Kafka have a high consistency risk and are not recommended.

Combining a Kafka Connect CDC connector with an Outbox Event Router (such as Debezium's EventRouter SMT) routes rows from the outbox table to different topics based on aggregate type or event_type. Kafka Connect sources are generally at-least-once; duplicates are absorbed by message IDs or compaction (availability depends on the connector and platform features).

Recommended: CDC + Outbox routing (clean separation of concerns; easy monitoring and replay)
Alternative: in-app poller (more control, but you build operations, retries, and locking yourself)
Not recommended: direct dual-write from the app to DB and Kafka (a breeding ground for inconsistency)

Approach	Delivery guarantee	Idempotence / dedup	Ordering
CDC + Outbox routing (Kafka Connect)	At-least-once (some connectors offer EOS depending on the implementation)	Use messageId as the key + compacted topic; dedup on the consumer side	Use key=aggregate_id to preserve per-partition ordering
In-app poller + transactional producer	At-least-once (idempotent producer suppresses duplicates)	Unique ID per Outbox row, idempotent producer, consumer-side dedup	Control keys in the app; watch out for concurrent polling and locking
Direct dual-write to DB and Kafka (anti-pattern)	Non-atomic; partial failures cause inconsistency	Risk of both duplicates and lost events	Ordering can be broken

Example: Kafka Connect (Debezium Postgres + Outbox Event Router)

{
  "name": "pg-outbox-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "postgres",
    "database.port": "5432",
    "database.user": "debezium",
    "database.password": "*****",
    "database.dbname": "appdb",
    "topic.prefix": "pg",
    "table.include.list": "public.outbox",
    "tombstones.on.delete": "false",
    "transforms": "outbox",
    "transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter",
    "transforms.outbox.route.by.field": "aggregate_type",
    "transforms.outbox.route.topic.replacement": "${routedBy}",
    "transforms.outbox.table.fields.additional.placement": "event_type:header:eventType,aggregate_id:header:aggregateId",
    "transforms.outbox.table.expand.json.payload": "true",
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "value.converter": "io.confluent.connect.avro.AvroConverter",
    "value.converter.schema.registry.url": "http://schema-registry:8081"
  }
}

Delivery Guarantees, Ordering, and Dedup: Following Kafka's Principles

Kafka's idempotent producer suppresses duplicates caused by producer retries, but it does not guarantee exactly-once between DB and Kafka. Outbox handles atomicity on the DB side, and you make the Kafka path idempotent on the assumption of at-least-once delivery.

Ordering is guaranteed only per partition. Use the same aggregate_id as the key so that events for one aggregate land in the same partition. As a rule, do not design for global ordering across different aggregates.

In Kafka Streams, processing.guarantee=exactly_once_v2 wraps reads, writes, and offset commits inside Kafka in a transaction. Remember, this is about processing segments inside Kafka and is a separate problem from atomicity with an external DB.

Producer: enable.idempotence=true, acks=all when needed, and appropriate retries
Keys: pin to aggregate_id so the same aggregate stays ordered
Dedup: key on messageId, use a compacted topic, or maintain a last-seen table on the consumer
Streams / Sink: secure Kafka-internal atomicity with EOSv2 or a transactional sink

Schema Design and Evolution: Envelope Strategy and Schema Registry

Design the Outbox payload around schema manageability. Wrapping it in an envelope (type, version, data) lets consumers branch parsing on event_type and version. Using a Schema Registry (Avro / Protobuf / JSON Schema) stabilizes compatibility checks and schema evolution.

Evolve schemas backward-compatibly by default and add fields with default values. When a breaking change is unavoidable, bump the event_type or version and split the topic (e.g. order.v2) or branch in the consumer.

Envelope: shape it as {type, version, data, metadata} to keep it extensible
Compatibility: prefer backward. Use defaults and nullables to smooth migrations
Routing: consider splitting topics by event_type or aggregate_type
Compaction: a compacted topic works well for upsert flows (latest value per key)

Operations and Monitoring: Cleanup, Replay, and Latency

For cleanup, either delete by retention window after confirming CDC has ingested rows, or rotate via partitioned tables. It is safer for the reader (CDC / poller) to track its own offset and preserve replayability than for the application to flip a "processed" flag.

Designing replay and backfill so that scanning the entire Outbox and resending is enough makes you resilient to operational incidents. When you assume duplicates from the start, the psychological cost of resending drops.

Use connector task health, source latency (DB commit time → Kafka publish time), consumer lag, and DLT (Dead Letter Topic) count as your core monitoring metrics.

Cleanup: retention-based deletes or partition rotation. Mind the order in which rows leave CDC scope
Replay: assume the whole Outbox may be resent and stay duplicate-safe
Monitoring: CDC task health, latency, lag, DLT count, and throughput
Capacity planning: size the Outbox so it can absorb peak backlogs, and do not forget index tuning

Check Your Understanding

CCDAK

問題 1

You want to keep RDB transactions and Kafka event delivery consistent. Assuming Kafka does not provide two-phase commit with an external DB, which architecture is the most robust?

Commit business data and the Outbox row in a single RDB transaction, then ship to Kafka via CDC (Outbox routing)
Start a Kafka transaction inside the app's DB transaction and commit them together
Skip the DB write, write only to Kafka, and batch the changes into the DB later
Dual-write directly from the app to both DB and Kafka and rely on retries on failure

正解: A

Kafka does not provide distributed transactions (2PC / XA) with external DBs. With the Outbox pattern, you atomically commit business data and the event inside the RDB, then reliably deliver to Kafka via CDC. B is unsupported, C makes the source of truth easy to drift, and D leads to dual-write inconsistency.

Frequently Asked Questions

Can Kafka alone deliver "exactly once" with a database?

No. Kafka transactions and idempotence only cover writes within Kafka and offset commits. They do not provide atomicity with an external database, so you need an Outbox that commits atomically on the DB side.

How should I evolve the Outbox schema?

Include a version in the envelope and set a backward-compatibility policy in the Schema Registry. For breaking changes, run a new event_type or a new topic (e.g. order.v2) in parallel and migrate consumers one by one.

How do I deal with table bloat and latency?

Partition the Outbox by date and optimize the (aggregate_id, created_at) index. Monitor CDC throughput and scale out when latency exceeds your threshold. Keep size in check by time-based deletes of ranges that have already been ingested.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Outbox Pattern: Keeping RDB and Kafka Consistent (CCDAK Guide)