ksqlDB is a component that lets you run SQL-like stream processing on top of Kafka, and it offers two ways to retrieve results: Push Query and Pull Query. The names sound similar, but the design philosophy and use cases are quite different.
This article walks through the differences between Push (a continuous query that keeps streaming results) and Pull (a one-shot lookup that returns the current value when you need it), covering internal behavior, performance, consistency, and the CCDAK exam patterns you should know.
A Push Query is a continuous query whose results are streamed without interruption as new data arrives. Because every change is delivered to the client as an event, it fits subscription-style UI updates, alerts, and real-time metrics.
A Pull Query is a one-shot lookup that returns the current aggregate or latest state once per request. It suits per-key current-value lookups and synchronous API responses.
ksqlDB consumes Kafka topics and materializes intermediate results into topics and state stores via persistent queries (CSAS/CTAS). A Push Query streams the updates of those streams or tables to clients incrementally, while a Pull Query reads the current value of a materialized table by key.
Pull Queries require a table. Internally they perform a key lookup against the state store (commonly backed by RocksDB) and return the currently joined/aggregated value. Push emits an event downstream every time a new record arrives or a recomputation happens.
Data flow and where each query type fits
The decision comes down to whether you want to receive changes continuously (Push) or only need the current settled value (Pull). Pick based on how the data is consumed: API design, UIs, alerts, batch post-processing, and so on.
The other axis is whether a materialized table is available. Pull strictly requires a table; you cannot Pull directly against a stream.
| Aspect | Push Query | Pull Query | Target |
|---|---|---|---|
| Result shape | Streams change events incrementally | Returns a snapshot of the current value | Stream or Table |
| Query predicate | Arbitrary predicates (stream/table) | Mainly key equality (table) | Table only |
EMIT CHANGES is the keyword for Push. Turning it into a persistent query (CREATE STREAM/TABLE AS SELECT ... EMIT CHANGES) continuously outputs results to topics and tables. To try it transiently, run SELECT ... EMIT CHANGES interactively.
Pull queries the current value against a table using a key predicate. You cannot issue it against a stream. Full-table-scan style Pulls are not assumed in normal operations.
Concrete Push and Pull examples (ksqlDB CLI/REST)
---- サンプルデータ定義 ----
CREATE STREAM orders (
order_id VARCHAR KEY,
user_id VARCHAR,
amount DECIMAL(9,2),
ts BIGINT
) WITH (
KAFKA_TOPIC='orders',
VALUE_FORMAT='JSON'
);
-- ユーザーごとの累計をマテリアイズ(永続クエリ: Table を作成)
CREATE TABLE spend_by_user AS
SELECT user_id,
SUM(amount) AS total_amount
FROM orders
GROUP BY user_id
EMIT CHANGES;
---- Push Query(トランジエント、しきい値超過を即時配信) ----
-- ksql> プロンプトで実行(終了は Ctrl+C または LIMIT)
SELECT user_id, total_amount
FROM spend_by_user
WHERE total_amount >= 1000
EMIT CHANGES;
-- 早期終了させたい場合
SELECT user_id, total_amount
FROM spend_by_user
EMIT CHANGES LIMIT 50;
---- Pull Query(現在値を一問一答で取得) ----
-- ksql> プロンプトで実行
SELECT total_amount FROM spend_by_user WHERE user_id='u_123';
-- REST 呼び出し例(Pull は /query エンドポイント)
curl -s -X POST http://localhost:8088/query \
-H 'Content-Type: application/vnd.ksql.v1+json; charset=utf-8' \
-d '{"ksql": "SELECT total_amount FROM spend_by_user WHERE user_id=\"u_123\";"}'
-- Push を REST で受け取りたい場合(/query-stream など)
curl -N -s -X POST http://localhost:8088/query-stream \
-H 'Content-Type: application/vnd.ksql.v1+json; charset=utf-8' \
-d '{"sql": "SELECT user_id, total_amount FROM spend_by_user EMIT CHANGES;"}'Because Push delivers continuously, the number of clients and the filter predicates directly affect throughput. Project only the columns you need and use predicates to reduce downstream load. If there is a stop condition, use LIMIT to make the disconnection explicit.
Pull is low-latency because it is a key lookup against the state store. However, what you read is the materialized current value, and there is no guarantee that the most recent record on the upstream topic has been reflected. Consistency depends on the propagation delay of the table update.
On the availability side, Pull is served by the node that owns the relevant partition. If failover causes a standby to serve the request, you should design with the understanding that the most recent update may not have been reflected yet.
On CCDAK, the targets of Push/Pull, termination conditions, the need for a table, and the presence of a key predicate come up frequently. Do not rely on the names alone — picture the internal behavior and disambiguate the options accordingly.
CCDAK
問題 1
Which statement about Push Query and Pull Query in ksqlDB is most correct?
正解: A
Push is continuous delivery via EMIT CHANGES and can be either transient or persistent. Pull requires a table and fetches the current value in the state store by key in a single shot. B reverses the targets, C wrongly claims no table is needed, and D incorrectly denies the existence of transient Push.
Does a Pull Query always return a value that reflects the very latest event?
No. A Pull Query returns the current value that has already been reflected in the materialized table. Because there is propagation delay from the latest event on the upstream topic to the table update, it does not guarantee the value right after the most recent event.
How can you use a Push Query temporarily and stop it safely?
Add a LIMIT clause like SELECT ... EMIT CHANGES LIMIT N to terminate it explicitly, or close the connection from the client side. For persistent queries, stop them with DROP QUERY.
Can a Pull Query perform aggregations or joins on the fly?
The standard pattern is to materialize a table ahead of time with a persistent query and then Pull from that table. Running heavy aggregations or joins ad hoc via Pull is not recommended and may be disallowed depending on the operational configuration.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...