Kafka

Kafka Streams Joins Deep Dive: Stream-Stream / Stream-Table / GlobalKTable

2026-04-19
NicheeLab Editorial Team

Joins in Kafka Streams largely define how expressive your real-time processing can be. They show up frequently on the CCDAK exam, and in production the choices of which join to use and how to align keys and partitions ultimately determine both performance and correctness.

This article compares Stream-Stream, Stream-Table, and GlobalKTable joins, organizing the gotchas around windows, timestamp propagation, repartitioning, and tombstones based on the documented behavior.

Join Types and How to Choose

Kafka Streams offers three core join types: Stream-Stream matches events within a time window, Stream-Table enriches a stream with the latest state, and GlobalKTable lets you look up data without worrying about distribution. The decision axes are: do you need time alignment, is the latest single row enough, and can you co-partition the inputs?

From a CCDAK perspective, expect questions on which joins require a window, which join modes (inner/left/outer) are supported, where the result timestamp comes from, when repartitioning is needed, and the trade-offs of GlobalKTable.

  • Stream-Stream requires a time window and supports inner / left / outer.
  • Stream-Table looks up the latest state — inner / left only, no window required.
  • GlobalKTable replicates fully to every node — ideal for foreign-key lookups but expensive in memory and storage.
Join TypeKey RequirementWindow Required?Supported Modes
Stream-Stream (KStream-KStream)Same key on both streamsRequired (before/after widths)inner / left / outer
Stream-Table (KStream-KTable)Stream key = table keyNot required (latest row lookup)inner / left
Stream-GlobalKTable (KStream-GlobalKTable)Stream key mapped to a lookup keyNot requiredinner / left

Basic skeleton (shared Serdes and builder)

StreamsBuilder builder = new StreamsBuilder();
// 共通の Serde 設定はプロパティ or StreamJoined/Joined で明示
// 以降のセクションで具体的な join を示します。

Stream-Stream Join: Production Essentials

A Stream-Stream join buffers both inputs in a window and matches pairs that share the same key and overlap in time. The JoinWindow has before/after widths, and incoming records match against the other side stored in the state store. Out-of-order arrivals still join as long as they fall within the window retention and grace period.

The result timestamp is the max of the two inputs. You can choose inner / left / outer: left outer emits a record for every left-side event even when no right-side match exists, and full outer emits for either side. Internally, RocksDB state stores are created for both sides, and retention must cover the window width (before + after) plus the grace period.

  • A window is mandatory; size it to the business SLA plus expected lateness.
  • Result timestamp = max(left.ts, right.ts).
  • Records with a null key are not joinable (and may be dropped during repartitioning).
  • State store retention must cover the window width plus the grace period.
  • inner / left / outer are all supported.

Conceptual diagram of a windowed Stream-Stream join

time --->

left (orders):   --- o@t5 ----- o@t8 -------- o@t14 -----
                               [<-- before 2m -->|<-- after 3m -->]
right(payments): ---- p@t6 ---- p@t10 -- p@t13 -------------

例: before=2m, after=3m の場合
- o@t8 は [t6..t11] の支払いと突合 → p@t6, p@t10 が候補
- o@t14 は [t12..t17] の支払いと突合 → p@t13 が候補
結果レコード ts = max(o.ts, p.ts)

Example of a windowed KStream-KStream join (Java DSL)

KStream<String, Order> orders = builder.stream("orders");
KStream<String, Payment> payments = builder.stream("payments");

// 5分窓、前2分・後3分(メソッド名はバージョンで異なる場合あり)
JoinWindows win = JoinWindows.of(Duration.ofMinutes(5))
                              .before(Duration.ofMinutes(2))
                              .after(Duration.ofMinutes(3));

KStream<String, Enriched> joined = orders.join(
    payments,
    (o, p) -> Enriched.from(o, p),
    win,
    StreamJoined.with(Serdes.String(), orderSerde, paymentSerde)
);

// left/outer も同様に leftJoin / outerJoin を使用

Stream-Table Join (KStream-KTable)

Stream-Table is the classic event + latest state enrichment. No window is needed; it looks up the latest table value at join time. Only inner and left are available (no outer). If the table value is null (a tombstone), inner drops the record and left emits with a null right value.

The result timestamp is inherited from the stream side. At runtime, the join consults a local store backed by the table's changelog. Because partitions must align, Kafka Streams may add an internal repartition on the stream side if needed.

  • Supported modes: inner / left (no outer).
  • A null on the table side is a tombstone: inner drops; left emits with a null right value.
  • Result timestamp = stream-side ts.
  • Keys must align — the stream side may be repartitioned if necessary.
  • The KTable source is best backed by a compacted topic.

KStream-KTable example (enriching with user attributes)

KStream<String, PageView> views = builder.stream("pageviews");
KTable<String, UserAttr> users = builder.table("users-changelog");

KStream<String, EnrichedView> enriched = views.leftJoin(
    users,
    (view, user) -> EnrichedView.of(view, user) // user が null の場合も考慮
);

enriched.to("pageviews-with-user");

GlobalKTable Joins and Foreign-Key Lookups

GlobalKTable replicates every partition of the table topic to each task so any record can be looked up locally and instantly. KStream-GlobalKTable accepts a KeyValueMapper to derive the lookup key from the stream record, which makes foreign-key lookups easy (for example, looking up a product category from an order's productId).

The upside is that no repartitioning is required and partition counts do not have to match. The downside is that every task holds the entire dataset, which increases memory and local storage cost. For large dimensions, plan size budgets and RocksDB/changelog operations carefully.

  • Supported modes: inner / left.
  • The lookup key is derived through a KeyValueMapper (it does not have to match the stream key).
  • Replication raises memory and storage cost — factor in a hard size limit.
  • Result timestamp = stream-side ts.
  • A compacted topic is preferable so the latest state is retained.

KStream-GlobalKTable example (foreign-key lookup)

KStream<String, Order> orders = builder.stream("orders");
GlobalKTable<String, Product> products = builder.globalTable("products-changelog");

KStream<String, EnrichedOrder> enriched = orders.leftJoin(
    products,
    // ストリームレコードから GlobalKTable の参照キーを導出
    (orderKey, order) -> order.getProductId(),
    (order, product) -> EnrichedOrder.of(order, product) // product が null もあり得る
);

enriched.to("orders-enriched");

Repartitioning, Key Design, and State Retention

Most joins assume aligned keys. Kafka Streams will create an internal repartition topic when needed and reshuffle the data by key. That costs extra network and storage, so whenever possible align the inputs at the source topic level (same key function, same partition count).

For Stream-Stream joins, state stores exist on both sides, and their retention depends on the window width and the grace period. Too short, and late events fail to join; too long, and storage balloons. For Stream-Table and GlobalKTable joins, the table side is continuously brought up to date through its changelog, and deletes propagate via tombstones.

  • Explicitly shaping the key with selectKey often helps avoid unnecessary repartitions.
  • Watch the compaction and retention settings on internal repartition topics.
  • Window length and grace (configurable on some versions) are operationally critical.
  • For very large tables, prefer a co-partitioned KTable over GlobalKTable.

Explicit example of key shaping and repartitioning

KStream<String, Event> in = builder.stream("input");
// 外部キー join を見据えて、参照キーで再キー化
KStream<String, Event> byRefKey = in.selectKey((k, v) -> v.getRefId());
// 以降の join は byRefKey を使うと内部再パーティションを減らせる場合がある

CCDAK Prep: Common Topics and Pitfalls

The exam regularly asks about the applicability of each join, the result timestamp, the supported join modes, the impact of tombstones, and the characteristics of GlobalKTable. API names vary across versions, so focus on the concepts and behavior first.

  • Only Stream-Stream needs a window — Stream-Table and GlobalKTable do not.
  • Result timestamp: Stream-Stream uses max(left, right); the others use the stream-side ts.
  • KStream-KTable supports inner / left only — no outer.
  • GlobalKTable skips repartitioning but consumes memory and local storage.
  • A null on the KTable side is a tombstone: inner produces nothing, left emits with a null right value.
  • Records with a null key cannot be joined.

Stabilizing types and keys with explicit Serdes / Joined

KStream<String, L> left = builder.stream("left");
KStream<String, R> right = builder.stream("right");
KStream<String, Out> out = left.join(
    right,
    (l, r) -> combine(l, r),
    JoinWindows.of(Duration.ofMinutes(3)),
    StreamJoined.with(Serdes.String(), lSerde, rSerde) // 既定Serdeの誤適用を防ぐ
);

Check Your Understanding

CCDAK

問題 1

You want to join order events with payment events, matching within a total 5-minute window around the order. You also want to surface orders that have no payment, and the result timestamp should be the latest event time. Which design is most appropriate?

  1. Use a KStream-KStream left join with a 2-minute before / 3-minute after window
  2. Use a KStream-KTable left join (payments as a KTable for latest-value lookup)
  3. Use a KStream-GlobalKTable inner join (payments as a GlobalKTable)
  4. Use a KStream-KStream inner join with a fixed 3-minute window

正解: A

A time window is required and orders without a payment must still be emitted, so a Stream-Stream left join fits the requirements. The result timestamp on a Stream-Stream join is the max of the two inputs. Stream-Table and GlobalKTable do latest-value lookups rather than windowed matching, so they are not suitable for surfacing orders whose payments have not yet arrived.

Frequently Asked Questions

How much late arrival does a Stream-Stream join tolerate?

Anything that fits within the join window plus the grace period (configurable in supported versions). Records arriving later than that will not match. Make sure the state store retention is large enough to cover both.

Does KStream-KTable still work when partition counts do not match?

It works, but Kafka Streams may insert an internal repartition on the stream side. That adds network and storage cost, so co-partitioning the inputs in advance is recommended whenever possible.

Can GlobalKTable be used for very large dimension tables?

Because every task replicates the full dataset, large tables put heavy pressure on memory and local storage. For big dimensions, prefer a co-partitioned KTable design over GlobalKTable.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
Kafka

Kafka Topics & Partitions: Distribution Fundamentals (2026)

How Kafka topics and partitions enable scale — ordering guar...

Kafka

CCDAK Exam Guide: Confluent Certified Developer (2026)

Complete prep for the CCDAK exam — Producer/Consumer API, St...

Kafka

CCAAK Exam Guide: Confluent Certified Administrator (2026)

Pass the CCAAK exam — cluster management, partitions, securi...

Kafka

Kafka Replicas & ISR: Fault Tolerance Explained (2026)

Replica placement, in-sync replicas (ISR), leader election. ...

Kafka

Kafka Offsets: Commit Modes & Consumer Position (2026)

Offset semantics — auto vs. manual commit, __consumer_offset...

Browse all Kafka articles (101)
© 2026 NicheeLab All rights reserved.