ksqlDB Push vs Pull Queries (2026)

ksqlDB is a component that lets you run SQL-like stream processing on top of Kafka, and it offers two ways to retrieve results: Push Query and Pull Query. The names sound similar, but the design philosophy and use cases are quite different.

This article walks through the differences between Push (a continuous query that keeps streaming results) and Pull (a one-shot lookup that returns the current value when you need it), covering internal behavior, performance, consistency, and the CCDAK exam patterns you should know.

Why Push and Pull Are Distinguished

A Push Query is a continuous query whose results are streamed without interruption as new data arrives. Because every change is delivered to the client as an event, it fits subscription-style UI updates, alerts, and real-time metrics.

A Pull Query is a one-shot lookup that returns the current aggregate or latest state once per request. It suits per-key current-value lookups and synchronous API responses.

Push = continuous delivery (EMIT CHANGES is required). Applicable to both streams and tables.
Pull = a single fetch of the current value. Only tables are supported (a materialized state is required).
Exam angle: Pull is a key-oriented current-value lookup; Push is incremental delivery of changes. Do not mix these up.

Execution Model and Internals

ksqlDB consumes Kafka topics and materializes intermediate results into topics and state stores via persistent queries (CSAS/CTAS). A Push Query streams the updates of those streams or tables to clients incrementally, while a Pull Query reads the current value of a materialized table by key.

Pull Queries require a table. Internally they perform a key lookup against the state store (commonly backed by RocksDB) and return the currently joined/aggregated value. Push emits an event downstream every time a new record arrives or a recomputation happens.

Persistent queries build the table, and the table becomes the source for Pull lookups.
Push keeps the result stream open and pushes from the server to the client.
Pull is RPC-like: it completes as request/response and typically requires a key equality predicate.

Data flow and where each query type fits

Use Cases and Selection Criteria

The decision comes down to whether you want to receive changes continuously (Push) or only need the current settled value (Pull). Pick based on how the data is consumed: API design, UIs, alerts, batch post-processing, and so on.

The other axis is whether a materialized table is available. Pull strictly requires a table; you cannot Pull directly against a stream.

Real-time screen updates, notifications, threshold monitoring → Push
Synchronous service-to-service lookups, on-demand dashboard queries → Pull
Prioritize high-volume delivery and low latency → Push; clear SLA for single responses → Pull

Aspect	Push Query	Pull Query	Target
Result shape	Streams change events incrementally	Returns a snapshot of the current value	Stream or Table
Query predicate	Arbitrary predicates (stream/table)	Mainly key equality (table)	Table only

Syntax and Typical Patterns (ksqlDB)

EMIT CHANGES is the keyword for Push. Turning it into a persistent query (CREATE STREAM/TABLE AS SELECT ... EMIT CHANGES) continuously outputs results to topics and tables. To try it transiently, run SELECT ... EMIT CHANGES interactively.

Pull queries the current value against a table using a key predicate. You cannot issue it against a stream. Full-table-scan style Pulls are not assumed in normal operations.

To temporarily stop a Push, attach a LIMIT or close the connection on the client side.
Pull is designed to return with low latency when you specify a key. Immediacy and consistency depend on the table update timing.
Designing the table names and schemas produced by persistent queries up front keeps Pull queries concise.

Concrete Push and Pull examples (ksqlDB CLI/REST)

---- サンプルデータ定義 ----
CREATE STREAM orders (
  order_id VARCHAR KEY,
  user_id  VARCHAR,
  amount   DECIMAL(9,2),
  ts       BIGINT
) WITH (
  KAFKA_TOPIC='orders',
  VALUE_FORMAT='JSON'
);

-- ユーザーごとの累計をマテリアイズ（永続クエリ: Table を作成）
CREATE TABLE spend_by_user AS
  SELECT user_id,
         SUM(amount) AS total_amount
  FROM orders
  GROUP BY user_id
  EMIT CHANGES;

---- Push Query（トランジエント、しきい値超過を即時配信） ----
-- ksql> プロンプトで実行（終了は Ctrl+C または LIMIT）
SELECT user_id, total_amount
FROM spend_by_user
WHERE total_amount >= 1000
EMIT CHANGES;

-- 早期終了させたい場合
SELECT user_id, total_amount
FROM spend_by_user
EMIT CHANGES LIMIT 50;

---- Pull Query（現在値を一問一答で取得） ----
-- ksql> プロンプトで実行
SELECT total_amount FROM spend_by_user WHERE user_id='u_123';

-- REST 呼び出し例（Pull は /query エンドポイント）
curl -s -X POST http://localhost:8088/query \
  -H 'Content-Type: application/vnd.ksql.v1+json; charset=utf-8' \
  -d '{"ksql": "SELECT total_amount FROM spend_by_user WHERE user_id=\"u_123\";"}'

-- Push を REST で受け取りたい場合（/query-stream など）
curl -N -s -X POST http://localhost:8088/query-stream \
  -H 'Content-Type: application/vnd.ksql.v1+json; charset=utf-8' \
  -d '{"sql": "SELECT user_id, total_amount FROM spend_by_user EMIT CHANGES;"}'

Latency, Availability, and Consistency in Practice

Because Push delivers continuously, the number of clients and the filter predicates directly affect throughput. Project only the columns you need and use predicates to reduce downstream load. If there is a stop condition, use LIMIT to make the disconnection explicit.

Pull is low-latency because it is a key lookup against the state store. However, what you read is the materialized current value, and there is no guarantee that the most recent record on the upstream topic has been reflected. Consistency depends on the propagation delay of the table update.

On the availability side, Pull is served by the node that owns the relevant partition. If failover causes a standby to serve the request, you should design with the understanding that the most recent update may not have been reflected yet.

Push: minimize column projection, filter early with predicates, avoid unnecessary joins.
Pull: always specify a key. Windowed tables also require specifying window bounds.
Exactly-once processing is a property of the stream processing, not a guarantee of strong consistency on Pull reads.

Exam Tips and Pitfalls (CCDAK)

On CCDAK, the targets of Push/Pull, termination conditions, the need for a table, and the presence of a key predicate come up frequently. Do not rely on the names alone — picture the internal behavior and disambiguate the options accordingly.

Pull is a current-value fetch against a table. Streams are not supported.
Push requires EMIT CHANGES. Note that a Push with a LIMIT is still a Push (just with a termination condition).
Pull normally assumes a key equality predicate. Designs that perform full-table scans via Pull tend to appear as wrong options on the exam.
The materialized table created by a persistent query is the foundation for Pull. Without a table in place, a Pull cannot work.
Watch how consistency is phrased: Pull returns the latest materialized state, not the latest topic record.

Check Your Understanding

CCDAK

問題 1

Which statement about Push Query and Pull Query in ksqlDB is most correct?

Push streams results continuously via EMIT CHANGES, and Pull returns the current per-key value from a materialized table once.
Push works only on streams, while Pull works on both streams and tables.
Pull always reads the latest Kafka records directly and does not require a materialized table.
Push must always be created as a persistent query and cannot be run transiently.

正解: A

Push is continuous delivery via EMIT CHANGES and can be either transient or persistent. Pull requires a table and fetches the current value in the state store by key in a single shot. B reverses the targets, C wrongly claims no table is needed, and D incorrectly denies the existence of transient Push.

Frequently Asked Questions

Does a Pull Query always return a value that reflects the very latest event?

No. A Pull Query returns the current value that has already been reflected in the materialized table. Because there is propagation delay from the latest event on the upstream topic to the table update, it does not guarantee the value right after the most recent event.

How can you use a Push Query temporarily and stop it safely?

Add a LIMIT clause like SELECT ... EMIT CHANGES LIMIT N to terminate it explicitly, or close the connection from the client side. For persistent queries, stop them with DROP QUERY.

Can a Pull Query perform aggregations or joins on the fly?

The standard pattern is to materialize a table ahead of time with a persistent query and then Pull from that table. Running heavy aggregations or joins ad hoc via Pull is not recommended and may be disallowed depending on the operational configuration.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Push Query vs Pull Query: Choosing Between Continuous Queries and One-Shot Lookups