Kafka

ksqlDB Materialized Views for CCDAK: Tables and Query Patterns

2026-04-19
NicheeLab Editorial Team

ksqlDB tables are materialized views that hold continuous-query results in a RocksDB state store and use a Kafka changelog topic for fault tolerance. This enables both low-latency key lookups (Pull Query) and continuous delivery (Push Query).

This article walks through how tables are built and what powers them internally, when to use Pull vs Push, key design and partitioning, typical query patterns, and operational and exam-prep takeaways aligned with CCDAK focus areas.

Tables and Materialized View Basics

In ksqlDB, a Stream is an "immutable sequence of events" and a Table is an "upsert representation of the latest state." A persistent query like CREATE TABLE AS SELECT maintains computed results in a local state store and persists the change history (changelog) to a Kafka topic behind the scenes. That is effectively a materialized view.

Tables are indexed by primary key and support instant lookups via Pull Query. Push Query, on the other hand, streams results out as new updates arrive. CCDAK frequently asks you to pick the right combination of Stream vs Table and Pull vs Push for a given use case.

  • Table = upsert (the latest value for a given key wins)
  • Materialized view = a table maintained by a persistent query (results pre-computed and held)
  • Pull Query = specify a key for an immediate read (synchronous request)
  • Push Query = continuously stream result updates (subscription style)

Under the Hood: Kafka Topics and State Stores

ksqlDB persistent queries run as Kafka Streams applications, with a RocksDB state store for each partition. Updates are applied locally and replicated to a changelog topic at the same time. On node failure, state is restored from the changelog.

During a rebalance, partition ownership moves and the new owning node catches up from the changelog. This lets the materialized view stay fault tolerant while still serving consistent results. In production, verify that the changelog topic is properly protected (appropriate replication.factor and cleanup.policy).

  • Local state: RocksDB (per partition)
  • Fault tolerance: updates persisted to a changelog topic
  • Scale-out: parallel execution per partition
  • Restore: changelog replay on ownership change or failure
  • Consistency: upsert ordering for a given key follows partition order

When to Use Pull vs Push (with Comparison Table)

Pull Query is ideal for instant lookups by primary key. ksqlDB routes the request to the right node based on the key and serves it immediately from the local materialized view. Push Query, in contrast, fits continuous update notifications and is effective for monitoring dashboards and downstream streaming integrations.

Achieving the same with a vanilla Kafka consumer means building your own state management and lookup capabilities in the application. ksqlDB tables provide this as a built-in feature.

  • One-off key lookup = Pull Query
  • Continuously stream the latest changes = Push Query
  • No DIY state management needed = the upside of ksqlDB tables
AspectPull Query (table)Push Query (table/stream)Plain Kafka consumer
Use caseInstant lookup by keySequential delivery of updatesEvent consumption and custom processing
ResponseSynchronous request, single responseContinuous stream of incremental responsesDepends on the application implementation
RequirementsPrimary key condition (equality)Subscribe to a queryNone (free-form)
LatencyMilliseconds to sub-second (local state)Immediate, gated by event arrivalImplementation and store dependent
ScaleHorizontal scale across partitions and nodesScale across broker → ksqlDB → clientScale with consumer groups
ConsistencyReturns the latest value for a given keyDelivers updates in arrival orderImplementation dependent (DIY state management)

Key Design and Partitioning: Building a Queryable View

Pull Query assumes lookup by primary key. That means an equality condition on the GROUP BY key or the primary key in the table definition is required. Proper key design and partitioning decide whether low-latency lookups succeed.

When JOINs or repartitioning is involved, ksqlDB automatically creates repartition topics as needed. For joins, key and partition-count alignment matters; perform explicit repartitioning with PARTITION BY when needed.

  • Pull Query is fundamentally an equality lookup on the primary key (range queries depend on use case and implementation)
  • GROUP BY determines the view's key. Changing the key triggers repartitioning
  • JOIN assumes co-partitioning. Verify alignment of keys and partition counts
  • High-cardinality keys help with scale, but watch out for locality and hot keys

Materialized view data flow (conceptual)

consume (persistent query)upsertchangerestorelocal readSource Topic(events)ksqlDB Persistent Query(CTAS/CSAS)Changelog Topic(compacted, Kafka)State Store(RocksDB/part)Pull Query (by key)ksqlDB routes to the owning node

Common Query Patterns: Aggregation, Lookup, Join

Tables pre-compute aggregations and dimension lookups and make them queryable instantly. The following shows typical definitions and usage.

For exam prep, remember these mappings: CTAS to create a table = materialize, Pull = key lookup, Push = continuous delivery, and JOINs require key alignment and possibly repartitioning.

  • Aggregation table: upsert COUNT/SUM via GROUP BY
  • Pull Query: specify the primary key for an instant lookup (synchronous)
  • Push Query: continuously deliver updates (monitoring, downstream integration)
  • Lookup join: enrich by JOINing a Stream with a Table

ksqlDB definition examples (materialize + lookup + join)

/* ソースストリーム定義 */
CREATE STREAM pageviews (
  user_id VARCHAR,
  page VARCHAR,
  ts BIGINT
) WITH (
  KAFKA_TOPIC='pageviews',
  VALUE_FORMAT='JSON'
);

/* ユーザー属性のディメンションテーブル(参照用) */
CREATE TABLE users (
  user_id VARCHAR PRIMARY KEY,
  plan VARCHAR
) WITH (
  KAFKA_TOPIC='users',
  VALUE_FORMAT='JSON'
);

/* 集計テーブル(マテリアライズド・ビュー) */
CREATE TABLE pv_count_by_user AS
SELECT user_id, COUNT(*) AS view_cnt
FROM pageviews
GROUP BY user_id
EMIT CHANGES;

/* Pull Query:単発のキー参照(最新の集計を即時取得) */
SELECT view_cnt FROM pv_count_by_user WHERE user_id='u-123';

/* Push Query:更新があるたびにストリーミング */
SELECT user_id, view_cnt FROM pv_count_by_user EMIT CHANGES;

/* 参照結合:ストリームをディメンション情報でエンリッチ */
CREATE STREAM enriched_pageviews AS
SELECT p.user_id, p.page, u.plan
FROM pageviews p
LEFT JOIN users u
ON p.user_id = u.user_id
EMIT CHANGES;

Operations and Exam Focus: Rebuilds, Failure Handling, Design Wisdom

To rebuild a materialized view, the source topic must retain the necessary history (sufficient retention period or compaction). Recreate the query, and ksqlDB reads from the beginning and rebuilds the state.

On failure or rebalance, local state is restored from the changelog. Scale depends on partition count and node count, and hot keys cause skew. Always think carefully about key distribution and partition design.

  • Rebuild = recreate the query + secure source history (consider retention period and compact)
  • Repartition topics are created when JOINs or GROUP BY change the key
  • Pull requires a key; Push streams continuously. Choose by use case
  • CCDAK frequently tests Stream vs Table semantics, keys and partitions, and the relationship between table materialization and the changelog

Check Your Understanding

CCDAK

問題 1

You want to instantly fetch per-user pageview counts and return the latest value as a one-off response when an API is called with a user_id. Which architecture is most appropriate?

  1. Create a ksqlDB table grouped by user_id and read from it with a Pull Query
  2. Have the application scan the raw pageviews topic from beginning to end on every request to aggregate
  3. Use a ksqlDB Push Query and have the API buffer the incoming stream of updates
  4. Do a Stream-Stream JOIN and search the result with a plain consumer

正解: A

For a one-off key lookup, a table indexed by primary key (a materialized view) plus a Pull Query is the right fit. Push Query is for continuous delivery and is not suited to one-off lookups. Scanning the raw topic on every request, or relying on JOINs alone, makes it hard to meet the instant-key-lookup requirement.

Frequently Asked Questions

Is log compaction required for the topic behind a table?

Tables fit an upsert model, and the changelog topic uses compaction to retain the latest value per key. The cleanup.policy on the sink topic depends on your environment, but enabling compact is standard when you want to retain the latest state.

How does a Pull Query scale?

The key determines the partition, and the request is served immediately from the local state on the owning node. You scale horizontally by increasing partition count and node count, and ksqlDB routes requests to the owning node based on the key.

How do I redesign a pipeline with a different key?

Create a new pipeline (table) re-partitioned with PARTITION BY or GROUP BY. A repartition topic is generated internally, after which Pull/Push queries can use the new key.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
Kafka

Kafka Topics & Partitions: Distribution Fundamentals (2026)

How Kafka topics and partitions enable scale — ordering guar...

Kafka

CCDAK Exam Guide: Confluent Certified Developer (2026)

Complete prep for the CCDAK exam — Producer/Consumer API, St...

Kafka

CCAAK Exam Guide: Confluent Certified Administrator (2026)

Pass the CCAAK exam — cluster management, partitions, securi...

Kafka

Kafka Replicas & ISR: Fault Tolerance Explained (2026)

Replica placement, in-sync replicas (ISR), leader election. ...

Kafka

Kafka Offsets: Commit Modes & Consumer Position (2026)

Offset semantics — auto vs. manual commit, __consumer_offset...

Browse all Kafka articles (101)
© 2026 NicheeLab All rights reserved.