Kafka

Mastering Kafka Connect Single Message Transforms: Lightweight Transformation for Flexible Pipelines

2026-04-19
NicheeLab Editorial Team

Single Message Transforms (SMTs) are lightweight, per-record transformations that run inside Kafka Connect. Heavy processing belongs in a different layer; SMTs handle minimal shaping, masking, and routing close to the connector definition.

This article focuses on stable features from the official documentation, the points most often tested on the CCDAK exam, and the operational pitfalls that catch teams off guard.

SMT Fundamentals: Position and Constraints

SMTs are stateless, per-record transformations applied to SourceRecord and SinkRecord inside Kafka Connect. They can be chained and are evaluated in the configured order. On the source side, the connector emits Connect's internal data/schema, SMTs modify it, and the converter serializes the result to Kafka. On the sink side, records are deserialized from Kafka, SMTs run next, and the connector writes to the external system.

SMTs are strictly for lightweight transformations. Aggregations, joins, stateful logic, and heavyweight enrichment belong in a separate layer such as Kafka Streams or ksqlDB.

  • Stateless, per-record, and applied in order
  • Explicitly target the key or the value (for example, ExtractField$Key vs. $Value)
  • Topic-name rewriting (RegexRouter) is supported, but SMTs do not directly control partition count or rebalancing
  • Both schema-aware and schemaless data are supported, but some transforms (such as Flatten) require a schema

How SMTs flow through Kafka Connect (source and sink sides)

Source SystemSource ConnectorSMTssource-sideKafkaserialization / deserializationSMTssink-sideSink ConnectorSink System

Common SMTs and Their Use Cases

The official SMTs cover field extraction, renaming, masking, flattening, timestamp conversion, and topic rewriting. Because they map directly to operational requirements, make sure you understand each transform's behavior and prerequisites — schema requirements, key/value targeting, and how missing fields are handled.

Typical patterns include setting the message key on the source side (ValueToKey), dropping unwanted fields (ReplaceField), adding audit metadata to headers or fields (InsertField), simple PII masking (MaskField), and environment- or version-based topic renaming (RegexRouter).

  • ExtractField: pulls a specific field out of a struct (works on both key and value)
  • ReplaceField: allow/deny lists and field renames
  • MaskField: simple value masking (fixed length, full replacement, and similar)
  • Flatten: flattens nested structures (requires a schema)
  • InsertField: adds static values or metadata (timestamp, topic, partition, and so on)
  • ValueToKey: promotes part of the value to the key (which can affect downstream partition assignment)

Comparing SMTs with Other Options: Inline Transform or Separate Layer?

Pick the right tool based on transform weight, statefulness, and deployment cost. CCDAK regularly asks where the boundary between SMTs and Kafka Streams or ksqlDB lies.

The general rule: lightweight shaping goes to SMTs, aggregation/joins/windowing go to Streams or ksqlDB, and heavy enrichment that depends on external APIs belongs in a separate process (or in connector-side features).

  • The strength of SMTs is keeping simple transformations inside the connector configuration
  • Move stateful processing or anything that needs high reusability into a code-based system (Streams or ksqlDB)
  • Use SMTs for topic renaming; handle branching or multi-destination delivery with multiple connectors or routing design
OptionPrimary useStateful?Ops / performance characteristics
SMT (Kafka Connect)Lightweight shaping, masking, topic renamingNo (per-record)Low overhead, configuration-only, order-sensitive
Kafka StreamsJoins, aggregations, windowing, state machinesYes (uses state stores)Deployed as an application; requires its own scaling and monitoring
ksqlDBSQL-like stream processingYes (managed internally)Closer to a managed experience; interactive operations
Custom connector / pluginIn-house, connector-specific extensions or heavy transformsDepends on the caseRequires its own distribution and compatibility management

Configuration in Practice: Chains, Key/Value, Predicates, and Order

List SMTs by alias under transforms, then assign each alias a type and its properties. Target the key or value with the $Key or $Value suffix on the class name. Multiple SMTs are applied in the order of the comma-separated list.

Predicates enable a given SMT only for records that match a condition. You can match on topic name, the presence of a header, and similar attributes. Availability and class names can depend on your Kafka Connect version, so check the documentation for your environment.

  • Order is strict — each step's output is the next step's input (for example, do not assume RegexRouter routing depends on the key produced by ValueToKey upstream)
  • Mixing up key and value targeting is one of the most common bug sources
  • Applying schema-requiring SMTs (such as Flatten) to schemaless input will fail
  • Validate against unit data with standalone Connect before rolling out to production

Example: chaining multiple SMTs (ReplaceField, InsertField, RegexRouter, ValueToKey)

name=jdbc-source-users
connector.class=io.confluent.connect.jdbc.JdbcSourceConnector
tasks.max=1
connection.url=jdbc:postgresql://db/app
mode=incrementing
incrementing.column.name=id
topic.prefix=src_

# SMT chain
transforms=rf,meta,route,v2k

# 不要フィールドの除去とリネーム
transforms.rf.type=org.apache.kafka.connect.transforms.ReplaceField$Value
transforms.rf.blacklist=password,sensitive_token
transforms.rf.renames=full_name:fullName

# メタ情報を値に付与
transforms.meta.type=org.apache.kafka.connect.transforms.InsertField$Value
transforms.meta.static.field=source
transforms.meta.static.value=postgres

# トピック名の環境リライティング
transforms.route.type=org.apache.kafka.connect.transforms.RegexRouter
transforms.route.regex=^src_(.*)$
transforms.route.replacement=dev.$1

# 値のidをキーへ
transforms.v2k.type=org.apache.kafka.connect.transforms.ValueToKey
transforms.v2k.fields=id

# 条件適用(例: 特定トピックにだけrfを適用)
# (使用可否やクラス名は環境のバージョンに依存)
# transforms.rf.predicate=topicMatch
# predicates=topicMatch
# predicates.topicMatch.type=org.apache.kafka.connect.transforms.predicates.TopicNameMatches
# predicates.topicMatch.pattern=^dev\.users$

Schemas, Types, and Error Handling: Schemaless, Tombstones, and DLQ

SMTs run against Connect's internal data. Whether you have schemaless (Map-style) or schema-aware (Struct) data changes which SMTs are applicable and how fields resolve. Transforms such as Flatten and SetSchemaMetadata require a schema, while others — for example, ReplaceField — also work with schemaless data.

Each SMT exposes options for how to treat compaction tombstones (null values) and missing fields (for example, ignore.missing=true). When inconsistencies or type-conversion errors occur, errors.tolerance and the DLQ settings let you keep processing and route the failed records aside.

  • Mis-specified types in TimestampConverter (string to Timestamp, long to string, and so on) are a classic failure mode
  • Watch out for numeric vs. string type mismatches when using MaskField
  • Check whether the SMT has an option to skip tombstone records or not apply to them
  • A DLQ makes a great starting point for inspection and reprocessing, and is easy to operate asynchronously

Example error-handling and DLQ configuration (works on both source and sink connectors)

errors.tolerance=all
errors.log.enable=true
errors.log.include.messages=true
errors.deadletterqueue.topic.name=connect-errors.dlq
errors.deadletterqueue.context.headers.enable=true
errors.deadletterqueue.topic.replication.factor=3

Operational Tips: Performance, Monitoring, and Testing

SMTs are lightweight, but CPU usage and latency grow with the length of the chain. Flatten and regex-based transforms add the most cost, so keep the chain to the minimum that gets the job done. Key-changing transforms (ValueToKey, ExtractField$Key) can also affect source-side partition assignment, so observe the load there as well.

Monitor Connect task and worker metrics. Continuously watch error counts, DLQ record counts, polling lag, ingress/egress throughput, and GC/CPU, and roll out changes incrementally so you can validate as you go.

  • Changes generally require reconfiguring (restarting) the connector, so codify the same configuration as IaC
  • Replay representative data — including schema evolution scenarios — in staging
  • Watch the relevant metrics (task-error-rate, sink-record-read-rate, source-record-write-rate, and so on)
  • Keep the SMT chain minimal; push heavy transforms to a different layer

Check Your Understanding

CCDAK

問題 1

A team wants to drop the password field from records ingested from a source database, prefix the topic name with an environment marker (dev.), and set the value's id as the message key. Which approach best satisfies these requirements using Kafka Connect features alone?

  1. Chain the SMTs ReplaceField, RegexRouter, and ValueToKey in that order
  2. Transform with Kafka Streams and write the result to a separate topic
  3. Fork the connector and implement a custom transformation in Java
  4. Modify the records via a producer Interceptor before they reach Connect

正解: A

Each requirement is a canonical SMT use case (field removal with ReplaceField, topic renaming with RegexRouter, key setting with ValueToKey), and an ordered chain inside the connector configuration covers all of them. No heavy processing or state management is needed, so Streams or a custom transform would be overkill.

Frequently Asked Questions

When do SMTs run relative to serialization and deserialization?

On the source side, the connector emits Connect's internal representation, SMTs are applied to it, and then the converter serializes the result before writing to Kafka. On the sink side, records are deserialized from Kafka, SMTs are applied, and then the connector writes to the external system.

Can SMTs control partitioning?

Not directly. However, SMTs that change the key (such as ValueToKey or ExtractField$Key) can indirectly affect partition assignment because the producer's partitioner uses the new key. RegexRouter can change the topic name, but SMTs do not orchestrate rebalancing themselves.

How many SMTs can I safely chain together?

The practical limit is driven by performance and readability rather than the implementation. A handful is a good target — once things get complex, push the logic to Streams, ksqlDB, or a separate process. Measure latency and throughput in staging and continue monitoring in production whenever you change the chain.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
Kafka

Kafka Topics & Partitions: Distribution Fundamentals (2026)

How Kafka topics and partitions enable scale — ordering guar...

Kafka

CCDAK Exam Guide: Confluent Certified Developer (2026)

Complete prep for the CCDAK exam — Producer/Consumer API, St...

Kafka

CCAAK Exam Guide: Confluent Certified Administrator (2026)

Pass the CCAAK exam — cluster management, partitions, securi...

Kafka

Kafka Replicas & ISR: Fault Tolerance Explained (2026)

Replica placement, in-sync replicas (ISR), leader election. ...

Kafka

Kafka Offsets: Commit Modes & Consumer Position (2026)

Offset semantics — auto vs. manual commit, __consumer_offset...

Browse all Kafka articles (101)
© 2026 NicheeLab All rights reserved.