Joins in Kafka Streams largely define how expressive your real-time processing can be. They show up frequently on the CCDAK exam, and in production the choices of which join to use and how to align keys and partitions ultimately determine both performance and correctness.
This article compares Stream-Stream, Stream-Table, and GlobalKTable joins, organizing the gotchas around windows, timestamp propagation, repartitioning, and tombstones based on the documented behavior.
Kafka Streams offers three core join types: Stream-Stream matches events within a time window, Stream-Table enriches a stream with the latest state, and GlobalKTable lets you look up data without worrying about distribution. The decision axes are: do you need time alignment, is the latest single row enough, and can you co-partition the inputs?
From a CCDAK perspective, expect questions on which joins require a window, which join modes (inner/left/outer) are supported, where the result timestamp comes from, when repartitioning is needed, and the trade-offs of GlobalKTable.
| Join Type | Key Requirement | Window Required? | Supported Modes |
|---|---|---|---|
| Stream-Stream (KStream-KStream) | Same key on both streams | Required (before/after widths) | inner / left / outer |
| Stream-Table (KStream-KTable) | Stream key = table key | Not required (latest row lookup) | inner / left |
| Stream-GlobalKTable (KStream-GlobalKTable) | Stream key mapped to a lookup key | Not required | inner / left |
Basic skeleton (shared Serdes and builder)
StreamsBuilder builder = new StreamsBuilder();
// 共通の Serde 設定はプロパティ or StreamJoined/Joined で明示
// 以降のセクションで具体的な join を示します。A Stream-Stream join buffers both inputs in a window and matches pairs that share the same key and overlap in time. The JoinWindow has before/after widths, and incoming records match against the other side stored in the state store. Out-of-order arrivals still join as long as they fall within the window retention and grace period.
The result timestamp is the max of the two inputs. You can choose inner / left / outer: left outer emits a record for every left-side event even when no right-side match exists, and full outer emits for either side. Internally, RocksDB state stores are created for both sides, and retention must cover the window width (before + after) plus the grace period.
Conceptual diagram of a windowed Stream-Stream join
time --->
left (orders): --- o@t5 ----- o@t8 -------- o@t14 -----
[<-- before 2m -->|<-- after 3m -->]
right(payments): ---- p@t6 ---- p@t10 -- p@t13 -------------
例: before=2m, after=3m の場合
- o@t8 は [t6..t11] の支払いと突合 → p@t6, p@t10 が候補
- o@t14 は [t12..t17] の支払いと突合 → p@t13 が候補
結果レコード ts = max(o.ts, p.ts)Example of a windowed KStream-KStream join (Java DSL)
KStream<String, Order> orders = builder.stream("orders");
KStream<String, Payment> payments = builder.stream("payments");
// 5分窓、前2分・後3分(メソッド名はバージョンで異なる場合あり)
JoinWindows win = JoinWindows.of(Duration.ofMinutes(5))
.before(Duration.ofMinutes(2))
.after(Duration.ofMinutes(3));
KStream<String, Enriched> joined = orders.join(
payments,
(o, p) -> Enriched.from(o, p),
win,
StreamJoined.with(Serdes.String(), orderSerde, paymentSerde)
);
// left/outer も同様に leftJoin / outerJoin を使用Stream-Table is the classic event + latest state enrichment. No window is needed; it looks up the latest table value at join time. Only inner and left are available (no outer). If the table value is null (a tombstone), inner drops the record and left emits with a null right value.
The result timestamp is inherited from the stream side. At runtime, the join consults a local store backed by the table's changelog. Because partitions must align, Kafka Streams may add an internal repartition on the stream side if needed.
KStream-KTable example (enriching with user attributes)
KStream<String, PageView> views = builder.stream("pageviews");
KTable<String, UserAttr> users = builder.table("users-changelog");
KStream<String, EnrichedView> enriched = views.leftJoin(
users,
(view, user) -> EnrichedView.of(view, user) // user が null の場合も考慮
);
enriched.to("pageviews-with-user");GlobalKTable replicates every partition of the table topic to each task so any record can be looked up locally and instantly. KStream-GlobalKTable accepts a KeyValueMapper to derive the lookup key from the stream record, which makes foreign-key lookups easy (for example, looking up a product category from an order's productId).
The upside is that no repartitioning is required and partition counts do not have to match. The downside is that every task holds the entire dataset, which increases memory and local storage cost. For large dimensions, plan size budgets and RocksDB/changelog operations carefully.
KStream-GlobalKTable example (foreign-key lookup)
KStream<String, Order> orders = builder.stream("orders");
GlobalKTable<String, Product> products = builder.globalTable("products-changelog");
KStream<String, EnrichedOrder> enriched = orders.leftJoin(
products,
// ストリームレコードから GlobalKTable の参照キーを導出
(orderKey, order) -> order.getProductId(),
(order, product) -> EnrichedOrder.of(order, product) // product が null もあり得る
);
enriched.to("orders-enriched");Most joins assume aligned keys. Kafka Streams will create an internal repartition topic when needed and reshuffle the data by key. That costs extra network and storage, so whenever possible align the inputs at the source topic level (same key function, same partition count).
For Stream-Stream joins, state stores exist on both sides, and their retention depends on the window width and the grace period. Too short, and late events fail to join; too long, and storage balloons. For Stream-Table and GlobalKTable joins, the table side is continuously brought up to date through its changelog, and deletes propagate via tombstones.
Explicit example of key shaping and repartitioning
KStream<String, Event> in = builder.stream("input");
// 外部キー join を見据えて、参照キーで再キー化
KStream<String, Event> byRefKey = in.selectKey((k, v) -> v.getRefId());
// 以降の join は byRefKey を使うと内部再パーティションを減らせる場合があるThe exam regularly asks about the applicability of each join, the result timestamp, the supported join modes, the impact of tombstones, and the characteristics of GlobalKTable. API names vary across versions, so focus on the concepts and behavior first.
Stabilizing types and keys with explicit Serdes / Joined
KStream<String, L> left = builder.stream("left");
KStream<String, R> right = builder.stream("right");
KStream<String, Out> out = left.join(
right,
(l, r) -> combine(l, r),
JoinWindows.of(Duration.ofMinutes(3)),
StreamJoined.with(Serdes.String(), lSerde, rSerde) // 既定Serdeの誤適用を防ぐ
);CCDAK
問題 1
You want to join order events with payment events, matching within a total 5-minute window around the order. You also want to surface orders that have no payment, and the result timestamp should be the latest event time. Which design is most appropriate?
正解: A
A time window is required and orders without a payment must still be emitted, so a Stream-Stream left join fits the requirements. The result timestamp on a Stream-Stream join is the max of the two inputs. Stream-Table and GlobalKTable do latest-value lookups rather than windowed matching, so they are not suitable for surfacing orders whose payments have not yet arrived.
How much late arrival does a Stream-Stream join tolerate?
Anything that fits within the join window plus the grace period (configurable in supported versions). Records arriving later than that will not match. Make sure the state store retention is large enough to cover both.
Does KStream-KTable still work when partition counts do not match?
It works, but Kafka Streams may insert an internal repartition on the stream side. That adds network and storage cost, so co-partitioning the inputs in advance is recommended whenever possible.
Can GlobalKTable be used for very large dimension tables?
Because every task replicates the full dataset, large tables put heavy pressure on memory and local storage. For big dimensions, prefer a co-partitioned KTable design over GlobalKTable.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...