Windowing is the time-axis design decision: which records belong together, and where do you draw the boundary. In Kafka, you almost always reason about this in terms of event time.
The first step is distinguishing Tumbling (fixed-length, non-overlapping), Hopping (fixed-length, overlapping), and Session (variable-length, mergeable) — and picking the right one for the job.
Kafka's windowing slices a continuous stream by time so you can aggregate or join records inside each slice. Both the Kafka Streams DSL and ksqlDB default to event time (the record's timestamp). Stream time advances to the maximum event timestamp observed so far.
To handle late events (records that arrive out of order), you configure how long after the window ends you are still willing to accept them — the Grace period. A window closes when stream-time reaches window-end + grace, and records belonging to that window after that point are discarded as late.
To handle time correctly, configure TimestampExtractor (in Kafka Streams) or the TIMESTAMP column (in ksqlDB) so each record carries the event time you actually want to window on.
The three window types differ in how their boundaries are formed and whether they overlap. Pick one based on your requirements: whether duplicate counting is acceptable, whether you need to detect user-activity clusters, and the reporting granularity you want. CCDAK repeatedly tests the definitions and the right use case for each.
Session windows are variable-length and split by an inactivity gap. The key fact — both for production and the exam — is that a late event can cause two previously separate sessions to merge.
| Type | Boundary / Length | Overlap | Grace (Late Tolerance) |
|---|---|---|---|
| Tumbling | Fixed length. Example: every minute [00:00, 00:01) | None (non-overlapping) | Accepts records up to Grace after window-end |
| Hopping | Fixed length plus an advance step. Example: 5-minute size with a 1-minute advance | Yes (the same event lands in multiple windows) | Accepts records up to Grace after window-end (per window) |
| Session | Variable length. Split by an inactivity gap and extended by new activity | Mergeable (late events can join sessions together) | Accepts records up to Grace after the session ends |
A window stays open until stream-time reaches window-end + Grace. Because stream-time advances to the maximum observed timestamp, a single future-dated record can suddenly close several older windows at once.
The diagram below uses Tumbling windows [0,5) and [5,10) with Grace=2. A late event with ts=4 that arrives while stream-time is 6 is still admitted, because window [0,5) stays open until 5+2=7.
Visualizing window-end and Grace (Tumbling, Grace=2)
time --->
0 2 4 6 8 10
|----|----|----|----|----|
[ W1:0-5 ) [ W2:5-10 )
|-------------------------|-------------------------
イベント到着:
e1: ts=3, arrive@3 -> 属する: W1(オンタイム)
e2: ts=6, arrive@6 -> 属する: W2(オンタイム)=> stream-time = 6
e3: ts=4, arrive@6 -> 遅延だが W1 に取り込み可能(stream-time 6 < 5+Grace 7)
e4: ts=1, arrive@9 -> W1は既に閉鎖(stream-time>=7)=> 遅着として破棄In the Kafka Streams DSL, use TimeWindows.ofSizeAndGrace and SessionWindows.ofInactivityGapAndGrace to declare the window size and Grace explicitly. For Hopping, set the advance step with advanceBy. Session windows require a merge function (the Aggregator Merger).
In ksqlDB, the WINDOW clause selects TUMBLING / HOPPING / SESSION, paired with SIZE, ADVANCE BY, and GRACE PERIOD. Add EMIT FINAL when you only want results once the window closes (suppressing intermediate updates). Exact clause placement and support vary by version, so check the documentation for the version you run before going to production.
Sample code for Kafka Streams DSL and ksqlDB
// Kafka Streams (Java)
KStream<String, String> source = builder.stream("input");
// 1) Tumbling 1分 + Grace 5分(イベント時刻ベース)
KTable<Windowed<String>, Long> tumbling = source
.groupByKey()
.windowedBy(TimeWindows.ofSizeAndGrace(Duration.ofMinutes(1), Duration.ofMinutes(5)))
.count();
// 必要ならウィンドウ閉鎖後のみ出力(中間を抑制)
KTable<Windowed<String>, Long> tumblingFinal = tumbling
.suppress(Suppressed.untilWindowCloses(Suppressed.BufferConfig.unbounded()));
// 2) Hopping(長さ5分・1分スライド)+ Grace 5分
KTable<Windowed<String>, Long> hopping = source
.groupByKey()
.windowedBy(
TimeWindows.ofSizeAndGrace(Duration.ofMinutes(5), Duration.ofMinutes(5))
.advanceBy(Duration.ofMinutes(1))
)
.count();
// 3) Session(非活動ギャップ5分・Grace 10分)
KTable<Windowed<String>, Long> sessions = source
.groupByKey()
.windowedBy(SessionWindows.ofInactivityGapAndGrace(Duration.ofMinutes(5), Duration.ofMinutes(10)))
.aggregate(
() -> 0L,
(key, value, agg) -> agg + 1,
(agg1, agg2) -> agg1 + agg2, // セッションマージ時の結合関数
Materialized.with(Serdes.String(), Serdes.Long())
);
// ksqlDB
-- ストリーム定義(イベント時刻としてtsを使用)
CREATE STREAM clicks (user_id VARCHAR, ts BIGINT)
WITH (KAFKA_TOPIC='clicks', VALUE_FORMAT='JSON', TIMESTAMP='ts');
-- Tumbling 1分 + Grace 5分、最終結果のみ
CREATE TABLE clicks_tumbling AS
SELECT user_id, COUNT(*) AS c
FROM clicks
WINDOW TUMBLING (SIZE 1 MINUTE, GRACE PERIOD 5 MINUTES)
GROUP BY user_id
EMIT FINAL;
-- Hopping(長さ5分・1分スライド)+ Grace 5分
CREATE TABLE clicks_hopping AS
SELECT user_id, COUNT(*) AS c
FROM clicks
WINDOW HOPPING (SIZE 5 MINUTES, ADVANCE BY 1 MINUTE, GRACE PERIOD 5 MINUTES)
GROUP BY user_id
EMIT FINAL;
-- Session(ギャップ60秒・Grace 5分)
CREATE TABLE clicks_session AS
SELECT user_id, COUNT(*) AS c
FROM clicks
WINDOW SESSION (60 SECONDS, GRACE PERIOD 5 MINUTES)
GROUP BY user_id
EMIT FINAL;
-- 注: 実運用ではバージョンによりWINDOW/GRACE句の詳細が異なる場合があります。Windowed aggregations keep internal state (the window store). Retention must be at least window size + Grace. When using ofSizeAndGrace or ofInactivityGapAndGrace in Kafka Streams, configuring retention to satisfy that constraint is the standard practice in production.
Suppression buffers intermediate results until the window closes, then emits only the final value downstream. untilWindowCloses is clean conceptually, but the buffer grows and adds memory pressure, so evaluate it against your load and latency requirements. EMIT FINAL plays a similar role in ksqlDB.
Recurring exam topics include window-type definitions, event time vs. processing time, late tolerance via Grace, and session merge behavior. Memorize two facts in particular: the merge condition for sessions (a late event spanning the inactivity gap joins two sessions) and the window close condition (stream-time >= window-end + grace).
Hopping inherently double-counts, Suppression and EMIT FINAL each have specific use cases, and insufficient retention silently breaks late-event handling. These are misunderstood in production and weaponized as trick questions on the exam.
CCDAK
問題 1
You want a 1-minute Tumbling aggregation in Kafka Streams that uses event time and accepts late events up to 5 minutes after window-end. Which configuration is most appropriate?
正解: A
Late tolerance is controlled by Grace, so ofSizeAndGrace(1 minute, 5 minutes) is correct. advanceBy sets the advance step for Hopping, not late tolerance. Suppression controls output timing — separate from late acceptance. Retention must be large enough, but with Grace set to 0, late events are still rejected.
What happens to late events that arrive after Grace expires? Can they still be captured?
Once the window closes (stream-time >= window-end + grace), any records that arrive afterward are dropped (Kafka Streams counts them in a metric). If you need to capture them, implement your own lateness check upstream of the window (for example, compare current stream-time to the record's timestamp) and route late records to a separate topic. ksqlDB does not provide a built-in late-record output.
When do session windows merge, and what gets recomputed?
A merge happens when two sessions for the same key — previously separated by an inactivity gap — get bridged by a late-arriving event. On merge, the start and end timestamps are updated and the aggregate is recomputed via the merge function (the aggregator merger).
Should I use event time or processing time?
Both Kafka Streams and ksqlDB are designed around event time. Configure TimestampExtractor or the TIMESTAMP column properly so windowing uses the time embedded in the data itself. Processing time is fine for latency measurements or rough dashboards, but it is not appropriate for accurate aggregations.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...