Kafka Connect provides two forms as an execution platform for connectors: standalone and distributed mode. Both use the same connector/task model, but they differ in where configuration is stored, in failover, and in how they scale.
This article organizes selection guidelines and operational best practices in plain language, and also reviews points frequently asked on the exams (CCDAK/CCAAK). The explanation focuses on stable concepts based on the official documentation.
Standalone runs connectors and tasks within a single process and keeps configuration and offsets in local files. There is no failover or rebalancing. It suits one-off migrations, development verification, and single-node batch processing.
Distributed mode forms a single cluster from multiple Connect workers and keeps configuration, offsets, and status in Kafka internal topics. When a worker fails, tasks are reassigned to other workers, enabling scale-out and rolling upgrades. This is the default choice for production operation.
| Aspect | Standalone | Distributed Mode |
|---|---|---|
| Configuration storage | Local file (properties) | Kafka internal topic (config.storage.topic) |
| Offset storage | Local file (offset.storage.file.filename) | Kafka internal topic (offset.storage.topic) |
| Availability | Single process. Stops on failure | Tasks automatically reassigned on worker failure |
| Scaling | Only by increasing processes. Manual partitioning | Automatic rebalancing by adding workers |
| Applying operational changes | Process restart is the norm | REST-based updates shared across all workers |
| Use cases | PoC, one-off migrations, development | Production always-on, HA, continuous operation |
Kafka Connect: Overview of Distributed Mode and Standalone
Minimal configuration comparison (standalone vs distributed worker)
# standalone.properties (excerpt)
bootstrap.servers=broker1:9092
key.converter=org.apache.kafka.connect.storage.StringConverter
value.converter=org.apache.kafka.connect.storage.StringConverter
offset.storage.file.filename=/var/lib/kafka-connect/connect.offsets
offset.flush.interval.ms=10000
# worker.properties (distributed mode, excerpt)
bootstrap.servers=broker1:9092,broker2:9092
group.id=connect-cluster-1
config.storage.topic=connect-configs
offset.storage.topic=connect-offsets
status.storage.topic=connect-status
config.storage.replication.factor=3
offset.storage.replication.factor=3
status.storage.replication.factor=3In Connect, a worker is a process, a connector is a job, and a task is a parallel execution unit. The upper limit on the number of tasks is determined by the connector configuration (tasks.max), and for sinks by the partition count.
In distributed mode, configuration (config.storage.topic), offsets (offset.storage.topic), and status (status.storage.topic) are stored in Kafka internal topics. As a result, settings submitted via REST are shared across the entire cluster, making failure recovery and rolling upgrades easier. Internal topics are created with compaction enabled, and a replication factor of at least 3 is recommended in production. On clusters where auto-creation is prohibited, create them in advance with appropriate settings.
Representative properties related to internal topics (distributed worker)
bootstrap.servers=broker1:9092,broker2:9092
group.id=connect-cluster-1
config.storage.topic=connect-configs
config.storage.replication.factor=3
offset.storage.topic=connect-offsets
offset.storage.replication.factor=3
status.storage.topic=connect-status
status.storage.replication.factor=3In distributed mode, when a worker fails, the tasks held by that worker are reassigned to healthy workers in the cluster. Since the connector configuration is in internal topics, execution can continue even after the process changes. Standalone is a single process, so a failure stops tasks and requires manual recovery.
Configuration changes are submitted via the REST API and shared by all workers in the distributed cluster. For recovery, use connector or task restart, and pause/resume as needed.
Operations via the Connect REST API (examples)
# List connectors
curl -s http://connect1:8083/connectors
# Create a connector
curl -s -X POST http://connect1:8083/connectors -H 'Content-Type: application/json' -d '{
"name": "jdbc-sink-01",
"config": {"connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector", "topics": "orders", "tasks.max": "4"}
}'
# Pause / resume
curl -X PUT http://connect1:8083/connectors/jdbc-sink-01/pause
curl -X PUT http://connect1:8083/connectors/jdbc-sink-01/resume
# Restart (connector / all tasks)
curl -X POST http://connect1:8083/connectors/jdbc-sink-01/restart?includeTasks=true&onlyFailed=trueSink parallelism is in principle limited by the partition count of the target topic, and tasks.max should be set at or below that. Sources depend on the partitionability of the target system. In distributed mode, total processing capacity can be increased simply by adding workers.
To scale with standalone, you must launch multiple separate processes and clearly partition target topics and tables (with separated offset files). In contrast, with distributed mode you submit a single configuration via REST and the cluster automatically allocates tasks.
Sink connector configuration example (throughput tuning)
{
"name": "s3-sink-raw",
"config": {
"connector.class": "io.confluent.connect.s3.S3SinkConnector",
"topics": "raw.events",
"tasks.max": "6",
"flush.size": "10000",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false"
}
}For monitoring, combining JMX metrics (per worker/connector/task) with the status topic is practical. For error handling, configure the Dead Letter Queue (DLQ) and log output appropriately.
For security, configure broker connection (SASL/SSL) via worker-common properties, or override per connector with producer.* / consumer.*. Upgrades in distributed mode are basically rolling, stopping, updating, and rejoining workers one at a time.
Representative operational configuration snippets
# Enable JMX (startup script, etc.)
export KAFKA_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9010 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false"
# DLQ and error control (connector-side settings)
errors.tolerance=all
errors.log.enable=true
errors.deadletterqueue.topic.name=connect-dlq
errors.deadletterqueue.context.headers.enable=true
# Broker connection security example (worker common)
security.protocol=SASL_SSL
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="user" password="pass";Mode differences and the roles of internal topics, offset storage location, and scaling constraints (the relationship between tasks.max and partition count) are frequently asked. Operations via REST, DLQ and error control, and the feasibility of rolling upgrades are also commonly tested.
Being able to correctly explain the terminology distinctions (worker / connector / task) and the mechanism by which configuration is shared in distributed mode (config.storage.topic) is a benchmark for a passing score.
Quick reference of key properties to remember
# Distributed mode required
group.id=connect-cluster-1
config.storage.topic=connect-configs
offset.storage.topic=connect-offsets
status.storage.topic=connect-status
# Standalone-specific
offset.storage.file.filename=/var/lib/kafka-connect/connect.offsets
# Parallelism and flush
tasks.max=4
offset.flush.interval.ms=10000CCDAK / CCAAK
問題 1
You want to maintain high availability in production while progressively increasing processing capacity in the future. Which Kafka Connect execution mode should you choose, and where is the configuration kept?
正解: C
If high availability and scale-out are requirements, distributed mode is recommended. In distributed mode, connector configuration is stored in a Kafka internal topic (config.storage.topic) and shared between workers. Standalone is managed in local files and is not HA.
Can standalone mode not be used in production?
It depends on the requirements, but since it is a single process with no failover and configuration and offsets are managed locally, it is unsuitable for general always-on production use from an availability and maintainability standpoint. It is effective for short-term batches, PoCs, and development verification.
What should I watch for when manually creating internal topics?
Always set cleanup.policy=compact, and a replication factor of 3 or more is recommended in production. Partition counts depend on workload, but common initial examples are config=1, status=5, and offsets=25. If automatic creation is disabled on the cluster, it is safer to create them with proper settings before connecting.
What is the difference between tasks and connectors?
A connector is the job definition (target system, topics, and various settings), while a task is the unit inside a worker that executes that job in parallel. tasks.max determines the upper limit on the number of tasks that can be created, and in distributed mode these tasks are distributed across the workers in the cluster.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...