Confluent Schema Registry: Subjects, Versions, Compatibility (2026)

Schema Registry is the central component for managing event schemas across Kafka and evolving them safely. Picking the right compatibility mode alone dramatically reduces incidents over the long run.

With an eye on CCDAK, this article walks through the differences between Avro, Protobuf, and JSON Schema, subject design, choosing compatibility modes, and the operational pitfalls you actually hit in production.

What Schema Registry Does and Its Basic Architecture

Schema Registry is a service that handles schema registration, versioning, lookup, and compatibility checks. Schemas are stored in an internal Kafka topic (by default, _schemas), and writes are serialized through a leader. Clients typically use the Confluent serializers to resolve a schema ID and embed it into each message.

The key point is that clients do not call REST on every message. Once a schema ID is resolved, the cached ID is reused for serialization and deserialization. Even if the Registry is briefly down, producers and consumers keep working as long as they only use already-known schemas.

Storage: persisted in an internal Kafka topic (typically _schemas)
API: REST endpoints for registration, lookup, and compatibility (subjects, versions, config)
Clients: the Confluent serializers resolve a schema ID and embed it in the wire format
Availability: leader election ensures consistency while multiple nodes provide redundancy
Caching: both clients and the server cache schemas and IDs

How producers, consumers, and Schema Registry interact

Schema Registration and Subject Design Essentials

Schema Registry versions schemas per subject. With the default TopicNameStrategy, the subjects are topic-value and topic-key. RecordNameStrategy and TopicRecordNameStrategy key the subject on the fully-qualified record name, which lets you share a single schema across multiple topics.

In production, the common pattern is to disable auto.register.schemas on producers and register schemas ahead of time, often through an approved CI/CD pipeline. Compatibility levels can be set both globally and per subject, but subject-level settings win when both are present.

Subject naming: TopicNameStrategy (default), RecordNameStrategy, TopicRecordNameStrategy
Registration flow: pre-register → pass compatibility check → ID assigned → embedded during serialization
Change review: validate compatibility against the registry as a dry run in CI before merging
Practical recommendation: set auto.register.schemas=false and consider use.latest.version
Separate subjects for key and value: design compatibility and evolution independently for each

Registering an Avro schema and example Java Producer configuration

# Register an Avro schema (orders-value)
# Note: the schema must be escaped as a JSON string inside the payload
curl -s -X POST \
  -H 'Content-Type: application/vnd.schemaregistry.v1+json' \
  --data '{"schema":"{\"type\":\"record\",\"name\":\"Order\",\"namespace\":\"com.example\",\"fields\":[{\"name\":\"id\",\"type\":\"string\"},{\"name\":\"amount\",\"type\":\"double\",\"default\":0.0}]}"}' \
  http://localhost:8081/subjects/orders-value/versions

// Key Java Producer properties using KafkaAvroSerializer
Properties p = new Properties();
p.put("bootstrap.servers", "broker:9092");
p.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
p.put("value.serializer", "io.confluent.kafka.serializers.KafkaAvroSerializer");
p.put("schema.registry.url", "http://sr:8081");
// Avoid auto-registration in production
p.put("auto.register.schemas", "false");
// Resolve and use the latest registered version (verify your requirements)
p.put("use.latest.version", "true");
// Subject naming strategy example (TopicNameStrategy is the default)
p.put("value.subject.name.strategy", "io.confluent.kafka.serializers.subject.TopicNameStrategy");

Compatibility Levels and How They Are Evaluated

Compatibility checks evaluate the relationship between a new schema and existing schemas. The strictness depends on whether the comparison targets only the latest version or the entire history (transitive). Avro, Protobuf, and JSON Schema differ in the fine print of what is allowed, but the meaning of each mode is the same.

Typically, choose Backward when you can update consumers first, Forward when producers are updated first and existing consumers must keep working, and Full when you need both guarantees.

Backward: the new schema can read old data (reader=new, writer=latest)
Forward: the old schema can read new data (reader=latest, writer=new)
Full: satisfies both Backward and Forward
Transitive: the comparison applies to the full history instead of only the latest version
None: no check at all (usually a distractor on exam questions)

Mode	Comparison target	Examples of allowed changes	Primary use case
BACKWARD	Latest only	Add a field with a default value, type promotion (e.g. int→long)	Environments where consumers can be updated first
BACKWARD_TRANSITIVE	Full history	Same as above but enforced against the entire history	Long-running systems that need stronger guarantees
FORWARD	Latest only	Add a new field (old readers ignore unknown fields), delete a field if the old schema has a default	Producers updated first while existing consumers stay untouched
FORWARD_TRANSITIVE	Full history	Same as above but enforced against the entire history	Robust for multi-stage rollouts
FULL	Both directions (latest)	Only allows evolution that satisfies both Backward and Forward	When strict bidirectional compatibility is required
NONE	None	No constraints; breaking changes pass through	Not recommended outside of testing or experimentation

Data Formats and SerDe Options

Confluent Schema Registry supports Avro, Protobuf, and JSON Schema. All three are covered by the REST API and compatibility checks, but they differ in expressiveness and wire-format characteristics. Pick based on size efficiency, tooling, and whether you need schema references.

The Confluent serializers prepend a magic byte 0 followed by a 4-byte schema ID. Consumers look up the schema by that ID at deserialization time.

Avro: efficient binary format with a strong evolution track record; frequently tested on CCDAK
Protobuf: popular for strong typing and inter-service contracts; evolution is managed via field numbers
JSON Schema: blends naturally with the JSON ecosystem but tends to be larger on the wire
Common: the magic byte plus schema ID makes schema resolution fast
Schema references: large schemas can be split into pieces (representation varies by format)

Operational Best Practices: Availability, Performance, and Security

For high availability, run Schema Registry as multiple nodes and let Kafka-based leader election serialize writes. Because schemas are replicated through Kafka, your backup story rides on the durability of the Kafka cluster itself.

To keep latency low, lean on the client-side schema cache and control how often new schemas appear. On the security side, combine SASL/SSL for broker connections with mTLS, ACLs, and authorization on the Registry itself.

Block ad-hoc registration: gate registration and compatibility checks through CI/CD
Cache benefits: reuse schema IDs and only hit the Registry when a new schema appears
Limit blast radius by setting compatibility per subject
Monitoring: surface compatibility violations, registration errors, and _schemas replication health
Security: protect the REST endpoints with mTLS, SASL, ACLs, and an authorization handler

CCDAK Exam Tips and Common Pitfalls

The single most important thing is to read compatibility modes correctly. If existing consumers cannot be updated and producers start sending a new schema first, you want Forward. If consumers are upgraded first, you want Backward. To guarantee both directions, use Full.

Subject naming strategies, the precedence between global and subject-level settings, and the wire format (magic byte plus schema ID) all show up regularly. As an edge case, remember that type promotion rules and whether defaults are required depend on the format you choose.

Precedence: subject-level settings override the global setting
Memorize the reader/writer direction for Forward and Backward by drawing it out
Understand how the presence (or absence) of default values affects compatibility
Know when to use TopicNameStrategy versus RecordNameStrategy variants
None is almost always a distractor, not the right answer

Check Your Understanding

CCDAK

問題 1

Existing consumers cannot be updated for the foreseeable future. The producer is going to start writing events with a new optional field, and you need the existing consumers to keep reading those events. Which subject-level compatibility setting is best?

A. FORWARD
B. BACKWARD
C. FULL_TRANSITIVE
D. NONE

正解: A

The requirement is that the old readers (existing consumers) can read new data from new writers, which maps to Forward compatibility. Backward guarantees that a new reader can read old data, Full is stricter than required by enforcing both directions, and None offers no guarantees.

Frequently Asked Questions

Which takes precedence, the global compatibility setting or the subject-level setting?

Subject-level compatibility settings override the global setting. Schema Registry checks the subject configuration first and only falls back to the global value when no subject-level setting exists.

What happens to producers and consumers if Schema Registry goes down?

As long as clients reuse already-resolved schema IDs, serialization and deserialization keep working. Registering a new schema or resolving an uncached ID will fail until the Registry recovers.

Why use JSON Schema instead of sending raw JSON over Kafka?

You gain versioning, compatibility checks, and validation. That makes the blast radius of API changes explicit and prevents breaking changes from reaching consumers.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる

Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.

Schema Registry End to End: Schema Management and Compatibility Checks