Schema management on Kafka is a core capability for balancing data quality and schema evolution. Confluent Schema Registry supports Avro, JSON Schema, and Protobuf. JSON Schema shines for readability and polyglot interop, but can lag Avro on type strictness and payload size.
This article organizes when to adopt JSON Schema, compatibility modes, naming strategies, evolution rules, and observability/performance highlights, alongside a comparison with Avro. We also flag the points CCDAK (Confluent Certified Developer for Apache Kafka) candidates should master.
JSON Schema lets you constrain data while keeping the human-readable JSON form. Observability (eyeballing payloads), affinity with HTTP services, and ease of use from lightweight clients are real wins. The downside is no Avro-style binary optimization, so payload size and serialization CPU cost tend to be higher.
On Kafka, JSON Schema is typically used with Confluent Schema Registry and Confluent's JSON Schema Serializer/Deserializer (KafkaJsonSchemaSerializer / KafkaJsonSchemaDeserializer). Compatibility (BACKWARD/FORWARD/FULL and their transitive forms) and subject naming strategies (TopicNameStrategy, etc.) are operated the same way as for Avro.
Kafka x JSON Schema data flow (conceptual)
Schema Registry also provides BACKWARD / FORWARD / FULL and their transitive compatibility modes for JSON Schema. The implementation relies on Confluent's compatibility checker, evaluating safety mainly around required/optional, enum, and type changes. Note that format and some extension keywords may not affect compatibility checks.
In production, defaulting to BACKWARD_TRANSITIVE to keep existing consumers safe is a sound starting point. Because JSON Schema's additionalProperties defaults to true, set it explicitly to false to prevent unexpected fields, and enumerate any allowed additions in properties.
Setting the compatibility mode (Schema Registry REST API)
curl -s -X PUT \
-H 'Content-Type: application/json' \
--data '{"compatibility": "BACKWARD_TRANSITIVE"}' \
http://localhost:8081/config/my-topic-value
# Check compatibility
curl -s http://localhost:8081/config/my-topic-value | jqSubject names are the unit of compatibility and independent rollout. For one event type per topic, choose TopicNameStrategy; for multiple event types packed into one topic, use RecordNameStrategy or TopicRecordNameStrategy. The same choices apply to JSON Schema just as for Avro.
The versioning principle is to preserve backward compatibility; when a breaking change is unavoidable, branch off into a new subject or new topic. JSON Schema's handling of additionalProperties and required is a frequent source of inter-version diffs, so codify your team's conventions in writing to reduce incidents.
Java Producer (using JSON Schema)
import io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient;
import io.confluent.kafka.schemaregistry.client.SchemaRegistryClient;
import io.confluent.kafka.schemaregistry.json.JsonSchema;
import io.confluent.kafka.serializers.json.KafkaJsonSchemaSerializer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import java.util.*;
public class JsonSchemaProducerExample {
public static void main(String[] args) throws Exception {
Properties props = new Properties();
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
props.put("schema.registry.url", "http://localhost:8081");
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaJsonSchemaSerializer.class.getName());
// Subject naming strategy (example)
props.put("value.subject.name.strategy", "io.confluent.kafka.serializers.subject.TopicNameStrategy");
String schemaStr = "{\n" +
" \"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n" +
" \"title\": \"UserCreated\",\n" +
" \"type\": \"object\",\n" +
" \"additionalProperties\": false,\n" +
" \"properties\": {\n" +
" \"id\": {\"type\": \"string\"},\n" +
" \"age\": {\"type\": \"integer\"},\n" +
" \"email\": {\"type\": \"string\", \"format\": \"email\"}\n" +
" },\n" +
" \"required\": [\"id\"]\n" +
"}";
JsonSchema jsonSchema = new JsonSchema(schemaStr);
// The serializer auto-registers on send (auto.register.schemas=true by default)
try (KafkaProducer<String, Object> producer = new KafkaProducer<>(props)) {
Map<String, Object> value = new HashMap<>();
value.put("id", UUID.randomUUID().toString());
value.put("age", 34);
value.put("email", "[email protected]");
ProducerRecord<String, Object> record = new ProducerRecord<>("user-events", value);
producer.send(record).get();
}
}
}Because JSON is text, message size is larger than sending the equivalent record as Avro binary. Measure the impact on network and storage costs, broker throughput, and client GC pressure up front. Compression (lz4, zstd) offsets this to some extent.
On the upside, troubleshooting is easier. You can observe messages directly in logs or via CLI and quickly look up the schema by its Schema Registry id. Validation cost depends on whether you validate on the producer, the consumer, or both. Decide where strictness lives: reject hard on the producer, or accept leniently on the consumer.
Both share Schema Registry and the compatibility-mode concepts, but they differ in type expression, encoding, and validation characteristics. Prefer JSON Schema for readability and polyglot integration; prefer Avro for strict types and efficiency. You can mix them, but operations are easier when standardized per topic.
Avro is more mature for logical types (decimal, timestamp, etc.) and default-value handling, and for long-term data-platform maintenance Avro often wins. Conversely, JSON Schema is great for API edges, PoCs, and early-stage event collection where schemas are still being explored.
| Aspect | JSON Schema | Avro |
|---|---|---|
| Encoding | Text JSON (readable) | Compact binary |
| Type strictness | Loose (format may not affect compatibility checks) | Strict (rich logical types) |
| Size / performance | Relatively large (mitigated by compression) | Tends to be small and fast |
| Compatibility modes | Use the same modes in the Registry as Avro | First-class support in the Registry |
| Tooling / visibility | Easy to read directly from CLI / logs | Hard to read without decoding |
| Schema references | Supported via the Registry's reference feature (implementation-dependent) | Supported (highly mature) |
On CCDAK, Schema Registry compatibility modes, naming strategies, and whether a given schema evolution is allowed come up frequently. Being able to articulate how JSON Schema's additionalProperties and required semantics differ from Avro's gives you an edge.
Real-world pitfalls are underestimating compatibility modes and letting subjects proliferate. Casually adding a new event to an existing subject with the wrong required can cause registration to fail with sudden compatibility violations. Build schema validation into CI to catch failures early.
CCDAK
問題 1
You want to add a new optional field 'address' to a Kafka topic that uses JSON Schema. Which combination evolves the schema safely without breaking existing consumers?
正解: A
BACKWARD (TRANSITIVE recommended) is appropriate for protecting existing consumers. Keep new fields optional, leave additionalProperties as false, and enumerate them explicitly in properties. Putting them in required risks breaking backward compatibility, and the presence or absence of format may not affect compatibility checks.
Is Schema Registry required to use JSON Schema?
Schema Registry is a prerequisite for schema management, compatibility checks, and schema-id-based deserialization on Kafka. Embedding schemas yourself is technically possible but not recommended for operations, evolution, or interoperability.
Can I use schema references ($ref) with JSON Schema?
Schema Registry supports schema references for Avro, JSON Schema, and Protobuf. JSON Schema $ref is supported, but the scope of compatibility checks and resolution behavior depend on the implementation, so check the official docs for constraints.
Do default values and format affect compatibility checks?
Compatibility checks are primarily based on structural changes (required, type, enum, etc.). format and some annotations may not influence compatibility. Default-value handling also differs from Avro, so validate changes against the Registry beforehand (/compatibility/subjects/... API).
Practice with certification-focused question sets
無料で問題を解いてみるNicheeLab Editorial Team
NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.
Kafka Topics & Partitions: Distribution Fundamentals (2026)
How Kafka topics and partitions enable scale — ordering guar...
CCDAK Exam Guide: Confluent Certified Developer (2026)
Complete prep for the CCDAK exam — Producer/Consumer API, St...
CCAAK Exam Guide: Confluent Certified Administrator (2026)
Pass the CCAAK exam — cluster management, partitions, securi...
Kafka Replicas & ISR: Fault Tolerance Explained (2026)
Replica placement, in-sync replicas (ISR), leader election. ...
Kafka Offsets: Commit Modes & Consumer Position (2026)
Offset semantics — auto vs. manual commit, __consumer_offset...