Kafka Connect Sink Connectors consume records from Kafka topics and write them to external systems such as databases, search engines, and object storage.
This article summarizes delivery guarantees, duplicate handling, error handling, scaling, and operational best practices grounded in the official Confluent/Apache Kafka documentation, alongside the points most often tested on the CCDAK exam.
Kafka Connect is a framework that runs connectors and tasks on workers (standalone or distributed). Sink Connectors read from Kafka and write to external systems. Offsets are stored in internal topics and committed after successful processing.
Data transformation is handled by converters (Avro/JSON Schema/Protobuf, etc.) and SMTs (Single Message Transform). On error, you can configure retries, skipping, or forwarding to a DLQ (Dead Letter Queue).
| Connector Type | Typical Use Case | Dedup / Consistency | Schema Evolution |
|---|---|---|---|
| JDBC Sink | Writes to RDBMS (aggregates / ODS) | Idempotent via UPSERT (e.g. pk.mode=record_key). Deletes use delete.enabled + tombstones. | auto.create/auto.evolve can follow column additions (requires validation). |
| Elasticsearch Sink | Full-text search / log analytics | Idempotent by using the Kafka key as document_id. External versioning is also an option. | Schemas use dynamic mapping; watch out for type mismatches. |
| S3 Sink | Data lake raw data storage | Assume duplicates may occur. Include topic/partition/offset in keys for identification. | Optimize operations with format choice (Avro/Parquet/JSON) and rotation settings. |
| Snowflake Sink | DWH ingestion | Typically uses MERGE or similar on the target side to ensure dedup and consistency. | Be careful about schema application on staging vs. target. |
Logical architecture of a Kafka Sink Connector
Sink Connectors are fundamentally at-least-once. Retries on failure and rebalances can cause the same record to be written more than once. Designing without assuming end-to-end Exactly-once is the safer approach.
Ordering is preserved within a Kafka partition. Tasks process records in order within their assigned partitions, but ordering across partitions is not guaranteed. Meet ordering requirements through topic and key design.
Connector-specific settings directly affect delivery guarantees, schema evolution, and throughput. Build on the defaults and constraints in the official documentation, and choose the safer design.
Here we list key points using the commonly deployed JDBC / Elasticsearch / S3 / Snowflake connectors as examples.
Kafka Connect lets you control error handling via configuration. Regardless of whether failures occur during record conversion, serialization, or destination writes, designing for progress via retries, skipping, or DLQ forwarding is the realistic approach.
Always provision a DLQ in production, and build monitoring and reprocessing workflows for DLQ records (such as re-ingest after fixes or one-off corrections) into operations.
Throughput is largely determined by the product of topic partition count and tasks.max. A task can own multiple partitions, but a single partition is always processed sequentially within one task.
Connector-specific batch settings (e.g. JDBC's batch.size) and fine-tuning consumption via Connect's consumer.override.* are also effective. Under overload, the connector pauses the consumer to apply backpressure.
Distributed mode is standard for production. Properly configure replication factor and cleanup policy (such as compact) on internal topics. Monitor using both the Connect REST API and JMX metrics.
On CCDAK, Connect's components, internal topics, delivery guarantees, DLQ, the roles of SMTs/converters, and representative connector settings are frequently tested. Build on the defaults and constraints from the official documentation and pay close attention to the wording of answer choices.
Example: creating a JDBC Sink Connector via REST (UPSERT + deletes + DLQ)
POST /connectors HTTP/1.1
Host: connect:8083
Content-Type: application/json
{
"name": "inventory-jdbc-sink",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
"tasks.max": "3",
"topics": "inventory.customers,inventory.orders",
"connection.url": "jdbc:postgresql://db:5432/warehouse",
"connection.user": "app",
"connection.password": "******",
"auto.create": "true",
"auto.evolve": "true",
"insert.mode": "upsert",
"pk.mode": "record_key",
"pk.fields": "id",
"delete.enabled": "true",
"behavior.on.null.values": "delete",
"batch.size": "3000",
"max.retries": "10",
"retry.backoff.ms": "5000",
"errors.tolerance": "all",
"errors.deadletterqueue.topic.name": "dlq.inventory",
"errors.deadletterqueue.context.headers.enable": "true",
"key.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"key.converter.schema.registry.url": "http://schema-registry:8081",
"value.converter.schema.registry.url": "http://schema-registry:8081",
"consumer.override.isolation.level": "read_committed"
}
}
CCDAK
問題 1
You want a Kafka Connect JDBC Sink to reflect Kafka tombstones (null value) as RDBMS deletes while maintaining UPSERT operations without duplicates. Which combination of settings is appropriate?
正解: A
Enabling deletes in JDBC Sink requires delete.enabled=true. To treat tombstones as deletes, specify behavior.on.null.values=delete. For idempotent UPSERT, use insert.mode=upsert, and since the primary key is typically the Kafka key, set pk.mode=record_key (and pk.fields).
Can Sink connectors achieve Exactly-once delivery?
Kafka Connect Sink connectors are fundamentally at-least-once. Because the target is an external system, end-to-end Exactly-once cannot be generalized. In practice you combine UPSERT/primary keys, document IDs, and downstream deduplication or MERGE operations to maintain consistency.
Can I write out schemaless JSON as-is?
Yes. Use JsonConverter for value.converter and set schemas.enable=false to handle records in schemaless mode. However, since schema evolution and type validation are not enforced, combining Avro/JSON Schema/Protobuf with a Schema Registry is recommended for long-term operations.
Can I merge multiple topics into a single table?
Yes, depending on connector and SMT configuration. JDBC Sink lets you control the destination via table.name.format, but schema compatibility and primary key consistency are mandatory. Conflicts and type mismatches are common, so consider an SMT that normalizes schemas or an intermediate topic for normalization before consolidating.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...