Kafka Streams Materialized Views: KTable as Cache (2026)

ksqlDB tables are materialized views that hold continuous-query results in a RocksDB state store and use a Kafka changelog topic for fault tolerance. This enables both low-latency key lookups (Pull Query) and continuous delivery (Push Query).

This article walks through how tables are built and what powers them internally, when to use Pull vs Push, key design and partitioning, typical query patterns, and operational and exam-prep takeaways aligned with CCDAK focus areas.

Tables and Materialized View Basics

In ksqlDB, a Stream is an "immutable sequence of events" and a Table is an "upsert representation of the latest state." A persistent query like CREATE TABLE AS SELECT maintains computed results in a local state store and persists the change history (changelog) to a Kafka topic behind the scenes. That is effectively a materialized view.

Tables are indexed by primary key and support instant lookups via Pull Query. Push Query, on the other hand, streams results out as new updates arrive. CCDAK frequently asks you to pick the right combination of Stream vs Table and Pull vs Push for a given use case.

Table = upsert (the latest value for a given key wins)
Materialized view = a table maintained by a persistent query (results pre-computed and held)
Pull Query = specify a key for an immediate read (synchronous request)
Push Query = continuously stream result updates (subscription style)

Under the Hood: Kafka Topics and State Stores

ksqlDB persistent queries run as Kafka Streams applications, with a RocksDB state store for each partition. Updates are applied locally and replicated to a changelog topic at the same time. On node failure, state is restored from the changelog.

During a rebalance, partition ownership moves and the new owning node catches up from the changelog. This lets the materialized view stay fault tolerant while still serving consistent results. In production, verify that the changelog topic is properly protected (appropriate replication.factor and cleanup.policy).

Local state: RocksDB (per partition)
Fault tolerance: updates persisted to a changelog topic
Scale-out: parallel execution per partition
Restore: changelog replay on ownership change or failure
Consistency: upsert ordering for a given key follows partition order

When to Use Pull vs Push (with Comparison Table)

Pull Query is ideal for instant lookups by primary key. ksqlDB routes the request to the right node based on the key and serves it immediately from the local materialized view. Push Query, in contrast, fits continuous update notifications and is effective for monitoring dashboards and downstream streaming integrations.

Achieving the same with a vanilla Kafka consumer means building your own state management and lookup capabilities in the application. ksqlDB tables provide this as a built-in feature.

One-off key lookup = Pull Query
Continuously stream the latest changes = Push Query
No DIY state management needed = the upside of ksqlDB tables

Aspect	Pull Query (table)	Push Query (table/stream)	Plain Kafka consumer
Use case	Instant lookup by key	Sequential delivery of updates	Event consumption and custom processing
Response	Synchronous request, single response	Continuous stream of incremental responses	Depends on the application implementation
Requirements	Primary key condition (equality)	Subscribe to a query	None (free-form)
Latency	Milliseconds to sub-second (local state)	Immediate, gated by event arrival	Implementation and store dependent
Scale	Horizontal scale across partitions and nodes	Scale across broker → ksqlDB → client	Scale with consumer groups
Consistency	Returns the latest value for a given key	Delivers updates in arrival order	Implementation dependent (DIY state management)

Key Design and Partitioning: Building a Queryable View

Pull Query assumes lookup by primary key. That means an equality condition on the GROUP BY key or the primary key in the table definition is required. Proper key design and partitioning decide whether low-latency lookups succeed.

When JOINs or repartitioning is involved, ksqlDB automatically creates repartition topics as needed. For joins, key and partition-count alignment matters; perform explicit repartitioning with PARTITION BY when needed.

Pull Query is fundamentally an equality lookup on the primary key (range queries depend on use case and implementation)
GROUP BY determines the view's key. Changing the key triggers repartitioning
JOIN assumes co-partitioning. Verify alignment of keys and partition counts
High-cardinality keys help with scale, but watch out for locality and hot keys

Materialized view data flow (conceptual)

Common Query Patterns: Aggregation, Lookup, Join

Tables pre-compute aggregations and dimension lookups and make them queryable instantly. The following shows typical definitions and usage.

For exam prep, remember these mappings: CTAS to create a table = materialize, Pull = key lookup, Push = continuous delivery, and JOINs require key alignment and possibly repartitioning.

Aggregation table: upsert COUNT/SUM via GROUP BY
Pull Query: specify the primary key for an instant lookup (synchronous)
Push Query: continuously deliver updates (monitoring, downstream integration)
Lookup join: enrich by JOINing a Stream with a Table

ksqlDB definition examples (materialize + lookup + join)

/* ソースストリーム定義 */
CREATE STREAM pageviews (
  user_id VARCHAR,
  page VARCHAR,
  ts BIGINT
) WITH (
  KAFKA_TOPIC='pageviews',
  VALUE_FORMAT='JSON'
);

/* ユーザー属性のディメンションテーブル（参照用） */
CREATE TABLE users (
  user_id VARCHAR PRIMARY KEY,
  plan VARCHAR
) WITH (
  KAFKA_TOPIC='users',
  VALUE_FORMAT='JSON'
);

/* 集計テーブル（マテリアライズド・ビュー） */
CREATE TABLE pv_count_by_user AS
SELECT user_id, COUNT(*) AS view_cnt
FROM pageviews
GROUP BY user_id
EMIT CHANGES;

/* Pull Query：単発のキー参照（最新の集計を即時取得） */
SELECT view_cnt FROM pv_count_by_user WHERE user_id='u-123';

/* Push Query：更新があるたびにストリーミング */
SELECT user_id, view_cnt FROM pv_count_by_user EMIT CHANGES;

/* 参照結合：ストリームをディメンション情報でエンリッチ */
CREATE STREAM enriched_pageviews AS
SELECT p.user_id, p.page, u.plan
FROM pageviews p
LEFT JOIN users u
ON p.user_id = u.user_id
EMIT CHANGES;

Operations and Exam Focus: Rebuilds, Failure Handling, Design Wisdom

To rebuild a materialized view, the source topic must retain the necessary history (sufficient retention period or compaction). Recreate the query, and ksqlDB reads from the beginning and rebuilds the state.

On failure or rebalance, local state is restored from the changelog. Scale depends on partition count and node count, and hot keys cause skew. Always think carefully about key distribution and partition design.

Rebuild = recreate the query + secure source history (consider retention period and compact)
Repartition topics are created when JOINs or GROUP BY change the key
Pull requires a key; Push streams continuously. Choose by use case
CCDAK frequently tests Stream vs Table semantics, keys and partitions, and the relationship between table materialization and the changelog

Check Your Understanding

CCDAK

問題 1

You want to instantly fetch per-user pageview counts and return the latest value as a one-off response when an API is called with a user_id. Which architecture is most appropriate?

Create a ksqlDB table grouped by user_id and read from it with a Pull Query
Have the application scan the raw pageviews topic from beginning to end on every request to aggregate
Use a ksqlDB Push Query and have the API buffer the incoming stream of updates
Do a Stream-Stream JOIN and search the result with a plain consumer

正解: A

For a one-off key lookup, a table indexed by primary key (a materialized view) plus a Pull Query is the right fit. Push Query is for continuous delivery and is not suited to one-off lookups. Scanning the raw topic on every request, or relying on JOINs alone, makes it hard to meet the instant-key-lookup requirement.

Frequently Asked Questions

Is log compaction required for the topic behind a table?

Tables fit an upsert model, and the changelog topic uses compaction to retain the latest value per key. The cleanup.policy on the sink topic depends on your environment, but enabling compact is standard when you want to retain the latest state.

How does a Pull Query scale?

The key determines the partition, and the request is served immediately from the local state on the owning node. You scale horizontally by increasing partition count and node count, and ksqlDB routes requests to the owning node based on the key.

How do I redesign a pipeline with a different key?

Create a new pipeline (table) re-partitioned with PARTITION BY or GROUP BY. A repartition topic is generated internally, after which Pull/Push queries can use the new key.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

ksqlDB Materialized Views for CCDAK: Tables and Query Patterns