Single Message Transforms (SMTs) are lightweight, per-record transformations that run inside Kafka Connect. Heavy processing belongs in a different layer; SMTs handle minimal shaping, masking, and routing close to the connector definition.
This article focuses on stable features from the official documentation, the points most often tested on the CCDAK exam, and the operational pitfalls that catch teams off guard.
SMTs are stateless, per-record transformations applied to SourceRecord and SinkRecord inside Kafka Connect. They can be chained and are evaluated in the configured order. On the source side, the connector emits Connect's internal data/schema, SMTs modify it, and the converter serializes the result to Kafka. On the sink side, records are deserialized from Kafka, SMTs run next, and the connector writes to the external system.
SMTs are strictly for lightweight transformations. Aggregations, joins, stateful logic, and heavyweight enrichment belong in a separate layer such as Kafka Streams or ksqlDB.
How SMTs flow through Kafka Connect (source and sink sides)
The official SMTs cover field extraction, renaming, masking, flattening, timestamp conversion, and topic rewriting. Because they map directly to operational requirements, make sure you understand each transform's behavior and prerequisites — schema requirements, key/value targeting, and how missing fields are handled.
Typical patterns include setting the message key on the source side (ValueToKey), dropping unwanted fields (ReplaceField), adding audit metadata to headers or fields (InsertField), simple PII masking (MaskField), and environment- or version-based topic renaming (RegexRouter).
Pick the right tool based on transform weight, statefulness, and deployment cost. CCDAK regularly asks where the boundary between SMTs and Kafka Streams or ksqlDB lies.
The general rule: lightweight shaping goes to SMTs, aggregation/joins/windowing go to Streams or ksqlDB, and heavy enrichment that depends on external APIs belongs in a separate process (or in connector-side features).
| Option | Primary use | Stateful? | Ops / performance characteristics |
|---|---|---|---|
| SMT (Kafka Connect) | Lightweight shaping, masking, topic renaming | No (per-record) | Low overhead, configuration-only, order-sensitive |
| Kafka Streams | Joins, aggregations, windowing, state machines | Yes (uses state stores) | Deployed as an application; requires its own scaling and monitoring |
| ksqlDB | SQL-like stream processing | Yes (managed internally) | Closer to a managed experience; interactive operations |
| Custom connector / plugin | In-house, connector-specific extensions or heavy transforms | Depends on the case | Requires its own distribution and compatibility management |
List SMTs by alias under transforms, then assign each alias a type and its properties. Target the key or value with the $Key or $Value suffix on the class name. Multiple SMTs are applied in the order of the comma-separated list.
Predicates enable a given SMT only for records that match a condition. You can match on topic name, the presence of a header, and similar attributes. Availability and class names can depend on your Kafka Connect version, so check the documentation for your environment.
Example: chaining multiple SMTs (ReplaceField, InsertField, RegexRouter, ValueToKey)
name=jdbc-source-users
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=jdbc:postgresql://db/app
mode=incrementing
incrementing.column.name=id
topic.prefix=src_
# SMT chain
transforms=rf,meta,route,v2k
# 不要フィールドの除去とリネーム
transforms.rf.type=org.apache.kafka.connect.transforms.ReplaceField$Value
transforms.rf.blacklist=password,sensitive_token
transforms.rf.renames=full_name:fullName
# メタ情報を値に付与
transforms.meta.type=org.apache.kafka.connect.transforms.InsertField$Value
transforms.meta.static.field=source
transforms.meta.static.value=postgres
# トピック名の環境リライティング
transforms.route.type=org.apache.kafka.connect.transforms.RegexRouter
transforms.route.regex=^src_(.*)$
transforms.route.replacement=dev.$1
# 値のidをキーへ
transforms.v2k.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.v2k.fields=id
# 条件適用(例: 特定トピックにだけrfを適用)
# (使用可否やクラス名は環境のバージョンに依存)
# transforms.rf.predicate=topicMatch
# predicates=topicMatch
# predicates.topicMatch.type=org.apache.kafka.connect.transforms.predicates.TopicNameMatches
# predicates.topicMatch.pattern=^dev\.users$SMTs run against Connect's internal data. Whether you have schemaless (Map-style) or schema-aware (Struct) data changes which SMTs are applicable and how fields resolve. Transforms such as Flatten and SetSchemaMetadata require a schema, while others — for example, ReplaceField — also work with schemaless data.
Each SMT exposes options for how to treat compaction tombstones (null values) and missing fields (for example, ignore.missing=true). When inconsistencies or type-conversion errors occur, errors.tolerance and the DLQ settings let you keep processing and route the failed records aside.
Example error-handling and DLQ configuration (works on both source and sink connectors)
errors.tolerance=all
errors.log.enable=true
errors.log.include.messages=true
errors.deadletterqueue.topic.name=connect-errors.dlq
errors.deadletterqueue.context.headers.enable=true
errors.deadletterqueue.topic.replication.factor=3SMTs are lightweight, but CPU usage and latency grow with the length of the chain. Flatten and regex-based transforms add the most cost, so keep the chain to the minimum that gets the job done. Key-changing transforms (ValueToKey, ExtractField$Key) can also affect source-side partition assignment, so observe the load there as well.
Monitor Connect task and worker metrics. Continuously watch error counts, DLQ record counts, polling lag, ingress/egress throughput, and GC/CPU, and roll out changes incrementally so you can validate as you go.
CCDAK
問題 1
A team wants to drop the password field from records ingested from a source database, prefix the topic name with an environment marker (dev.), and set the value's id as the message key. Which approach best satisfies these requirements using Kafka Connect features alone?
正解: A
Each requirement is a canonical SMT use case (field removal with ReplaceField, topic renaming with RegexRouter, key setting with ValueToKey), and an ordered chain inside the connector configuration covers all of them. No heavy processing or state management is needed, so Streams or a custom transform would be overkill.
When do SMTs run relative to serialization and deserialization?
On the source side, the connector emits Connect's internal representation, SMTs are applied to it, and then the converter serializes the result before writing to Kafka. On the sink side, records are deserialized from Kafka, SMTs are applied, and then the connector writes to the external system.
Can SMTs control partitioning?
Not directly. However, SMTs that change the key (such as ValueToKey or ExtractField$Key) can indirectly affect partition assignment because the producer's partitioner uses the new key. RegexRouter can change the topic name, but SMTs do not orchestrate rebalancing themselves.
How many SMTs can I safely chain together?
The practical limit is driven by performance and readability rather than the implementation. A handful is a good target — once things get complex, push the logic to Streams, ksqlDB, or a separate process. Measure latency and throughput in staging and continue monitoring in production whenever you change the chain.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...