ksqlDB tables are materialized views that hold continuous-query results in a RocksDB state store and use a Kafka changelog topic for fault tolerance. This enables both low-latency key lookups (Pull Query) and continuous delivery (Push Query).
This article walks through how tables are built and what powers them internally, when to use Pull vs Push, key design and partitioning, typical query patterns, and operational and exam-prep takeaways aligned with CCDAK focus areas.
In ksqlDB, a Stream is an "immutable sequence of events" and a Table is an "upsert representation of the latest state." A persistent query like CREATE TABLE AS SELECT maintains computed results in a local state store and persists the change history (changelog) to a Kafka topic behind the scenes. That is effectively a materialized view.
Tables are indexed by primary key and support instant lookups via Pull Query. Push Query, on the other hand, streams results out as new updates arrive. CCDAK frequently asks you to pick the right combination of Stream vs Table and Pull vs Push for a given use case.
ksqlDB persistent queries run as Kafka Streams applications, with a RocksDB state store for each partition. Updates are applied locally and replicated to a changelog topic at the same time. On node failure, state is restored from the changelog.
During a rebalance, partition ownership moves and the new owning node catches up from the changelog. This lets the materialized view stay fault tolerant while still serving consistent results. In production, verify that the changelog topic is properly protected (appropriate replication.factor and cleanup.policy).
Pull Query is ideal for instant lookups by primary key. ksqlDB routes the request to the right node based on the key and serves it immediately from the local materialized view. Push Query, in contrast, fits continuous update notifications and is effective for monitoring dashboards and downstream streaming integrations.
Achieving the same with a vanilla Kafka consumer means building your own state management and lookup capabilities in the application. ksqlDB tables provide this as a built-in feature.
| Aspect | Pull Query (table) | Push Query (table/stream) | Plain Kafka consumer |
|---|---|---|---|
| Use case | Instant lookup by key | Sequential delivery of updates | Event consumption and custom processing |
| Response | Synchronous request, single response | Continuous stream of incremental responses | Depends on the application implementation |
| Requirements | Primary key condition (equality) | Subscribe to a query | None (free-form) |
| Latency | Milliseconds to sub-second (local state) | Immediate, gated by event arrival | Implementation and store dependent |
| Scale | Horizontal scale across partitions and nodes | Scale across broker → ksqlDB → client | Scale with consumer groups |
| Consistency | Returns the latest value for a given key | Delivers updates in arrival order | Implementation dependent (DIY state management) |
Pull Query assumes lookup by primary key. That means an equality condition on the GROUP BY key or the primary key in the table definition is required. Proper key design and partitioning decide whether low-latency lookups succeed.
When JOINs or repartitioning is involved, ksqlDB automatically creates repartition topics as needed. For joins, key and partition-count alignment matters; perform explicit repartitioning with PARTITION BY when needed.
Materialized view data flow (conceptual)
Tables pre-compute aggregations and dimension lookups and make them queryable instantly. The following shows typical definitions and usage.
For exam prep, remember these mappings: CTAS to create a table = materialize, Pull = key lookup, Push = continuous delivery, and JOINs require key alignment and possibly repartitioning.
ksqlDB definition examples (materialize + lookup + join)
/* ソースストリーム定義 */
CREATE STREAM pageviews (
user_id VARCHAR,
page VARCHAR,
ts BIGINT
) WITH (
KAFKA_TOPIC='pageviews',
VALUE_FORMAT='JSON'
);
/* ユーザー属性のディメンションテーブル(参照用) */
CREATE TABLE users (
user_id VARCHAR PRIMARY KEY,
plan VARCHAR
) WITH (
KAFKA_TOPIC='users',
VALUE_FORMAT='JSON'
);
/* 集計テーブル(マテリアライズド・ビュー) */
CREATE TABLE pv_count_by_user AS
SELECT user_id, COUNT(*) AS view_cnt
FROM pageviews
GROUP BY user_id
EMIT CHANGES;
/* Pull Query:単発のキー参照(最新の集計を即時取得) */
SELECT view_cnt FROM pv_count_by_user WHERE user_id='u-123';
/* Push Query:更新があるたびにストリーミング */
SELECT user_id, view_cnt FROM pv_count_by_user EMIT CHANGES;
/* 参照結合:ストリームをディメンション情報でエンリッチ */
CREATE STREAM enriched_pageviews AS
SELECT p.user_id, p.page, u.plan
FROM pageviews p
LEFT JOIN users u
ON p.user_id = u.user_id
EMIT CHANGES;To rebuild a materialized view, the source topic must retain the necessary history (sufficient retention period or compaction). Recreate the query, and ksqlDB reads from the beginning and rebuilds the state.
On failure or rebalance, local state is restored from the changelog. Scale depends on partition count and node count, and hot keys cause skew. Always think carefully about key distribution and partition design.
CCDAK
問題 1
You want to instantly fetch per-user pageview counts and return the latest value as a one-off response when an API is called with a user_id. Which architecture is most appropriate?
正解: A
For a one-off key lookup, a table indexed by primary key (a materialized view) plus a Pull Query is the right fit. Push Query is for continuous delivery and is not suited to one-off lookups. Scanning the raw topic on every request, or relying on JOINs alone, makes it hard to meet the instant-key-lookup requirement.
Is log compaction required for the topic behind a table?
Tables fit an upsert model, and the changelog topic uses compaction to retain the latest value per key. The cleanup.policy on the sink topic depends on your environment, but enabling compact is standard when you want to retain the latest state.
How does a Pull Query scale?
The key determines the partition, and the request is served immediately from the local state on the owning node. You scale horizontally by increasing partition count and node count, and ksqlDB routes requests to the owning node based on the key.
How do I redesign a pipeline with a different key?
Create a new pipeline (table) re-partitioned with PARTITION BY or GROUP BY. A repartition topic is generated internally, after which Pull/Push queries can use the new key.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...