Schema Registry is the central component for managing event schemas across Kafka and evolving them safely. Picking the right compatibility mode alone dramatically reduces incidents over the long run.
With an eye on CCDAK, this article walks through the differences between Avro, Protobuf, and JSON Schema, subject design, choosing compatibility modes, and the operational pitfalls you actually hit in production.
Schema Registry is a service that handles schema registration, versioning, lookup, and compatibility checks. Schemas are stored in an internal Kafka topic (by default, _schemas), and writes are serialized through a leader. Clients typically use the Confluent serializers to resolve a schema ID and embed it into each message.
The key point is that clients do not call REST on every message. Once a schema ID is resolved, the cached ID is reused for serialization and deserialization. Even if the Registry is briefly down, producers and consumers keep working as long as they only use already-known schemas.
How producers, consumers, and Schema Registry interact
Schema Registry versions schemas per subject. With the default TopicNameStrategy, the subjects are topic-value and topic-key. RecordNameStrategy and TopicRecordNameStrategy key the subject on the fully-qualified record name, which lets you share a single schema across multiple topics.
In production, the common pattern is to disable auto.register.schemas on producers and register schemas ahead of time, often through an approved CI/CD pipeline. Compatibility levels can be set both globally and per subject, but subject-level settings win when both are present.
Registering an Avro schema and example Java Producer configuration
# Register an Avro schema (orders-value)
# Note: the schema must be escaped as a JSON string inside the payload
curl -s -X POST \
-H 'Content-Type: application/vnd.schemaregistry.v1+json' \
--data '{"schema":"{\"type\":\"record\",\"name\":\"Order\",\"namespace\":\"com.example\",\"fields\":[{\"name\":\"id\",\"type\":\"string\"},{\"name\":\"amount\",\"type\":\"double\",\"default\":0.0}]}"}' \
http://localhost:8081/subjects/orders-value/versions
// Key Java Producer properties using KafkaAvroSerializer
Properties p = new Properties();
p.put("bootstrap.servers", "broker:9092");
p.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
p.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
p.put("schema.registry.url", "http://sr:8081");
// Avoid auto-registration in production
p.put("auto.register.schemas", "false");
// Resolve and use the latest registered version (verify your requirements)
p.put("use.latest.version", "true");
// Subject naming strategy example (TopicNameStrategy is the default)
p.put("value.subject.name.strategy", "io.confluent.kafka.serializers.subject.TopicNameStrategy");Compatibility checks evaluate the relationship between a new schema and existing schemas. The strictness depends on whether the comparison targets only the latest version or the entire history (transitive). Avro, Protobuf, and JSON Schema differ in the fine print of what is allowed, but the meaning of each mode is the same.
Typically, choose Backward when you can update consumers first, Forward when producers are updated first and existing consumers must keep working, and Full when you need both guarantees.
| Mode | Comparison target | Examples of allowed changes | Primary use case |
|---|---|---|---|
| BACKWARD | Latest only | Add a field with a default value, type promotion (e.g. int→long) | Environments where consumers can be updated first |
| BACKWARD_TRANSITIVE | Full history | Same as above but enforced against the entire history | Long-running systems that need stronger guarantees |
| FORWARD | Latest only | Add a new field (old readers ignore unknown fields), delete a field if the old schema has a default | Producers updated first while existing consumers stay untouched |
| FORWARD_TRANSITIVE | Full history | Same as above but enforced against the entire history | Robust for multi-stage rollouts |
| FULL | Both directions (latest) | Only allows evolution that satisfies both Backward and Forward | When strict bidirectional compatibility is required |
| NONE | None | No constraints; breaking changes pass through | Not recommended outside of testing or experimentation |
Confluent Schema Registry supports Avro, Protobuf, and JSON Schema. All three are covered by the REST API and compatibility checks, but they differ in expressiveness and wire-format characteristics. Pick based on size efficiency, tooling, and whether you need schema references.
The Confluent serializers prepend a magic byte 0 followed by a 4-byte schema ID. Consumers look up the schema by that ID at deserialization time.
For high availability, run Schema Registry as multiple nodes and let Kafka-based leader election serialize writes. Because schemas are replicated through Kafka, your backup story rides on the durability of the Kafka cluster itself.
To keep latency low, lean on the client-side schema cache and control how often new schemas appear. On the security side, combine SASL/SSL for broker connections with mTLS, ACLs, and authorization on the Registry itself.
The single most important thing is to read compatibility modes correctly. If existing consumers cannot be updated and producers start sending a new schema first, you want Forward. If consumers are upgraded first, you want Backward. To guarantee both directions, use Full.
Subject naming strategies, the precedence between global and subject-level settings, and the wire format (magic byte plus schema ID) all show up regularly. As an edge case, remember that type promotion rules and whether defaults are required depend on the format you choose.
CCDAK
問題 1
Existing consumers cannot be updated for the foreseeable future. The producer is going to start writing events with a new optional field, and you need the existing consumers to keep reading those events. Which subject-level compatibility setting is best?
正解: A
The requirement is that the old readers (existing consumers) can read new data from new writers, which maps to Forward compatibility. Backward guarantees that a new reader can read old data, Full is stricter than required by enforcing both directions, and None offers no guarantees.
Which takes precedence, the global compatibility setting or the subject-level setting?
Subject-level compatibility settings override the global setting. Schema Registry checks the subject configuration first and only falls back to the global value when no subject-level setting exists.
What happens to producers and consumers if Schema Registry goes down?
As long as clients reuse already-resolved schema IDs, serialization and deserialization keep working. Registering a new schema or resolving an uncached ID will fail until the Registry recovers.
Why use JSON Schema instead of sending raw JSON over Kafka?
You gain versioning, compatibility checks, and validation. That makes the blast radius of API changes explicit and prevents breaking changes from reaching consumers.
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...