Kafka

Operating JSON Schema on Kafka: Benefits and Practical Comparison with Avro

2026-04-19
NicheeLab Editorial Team

Schema management on Kafka is a core capability for balancing data quality and schema evolution. Confluent Schema Registry supports Avro, JSON Schema, and Protobuf. JSON Schema shines for readability and polyglot interop, but can lag Avro on type strictness and payload size.

This article organizes when to adopt JSON Schema, compatibility modes, naming strategies, evolution rules, and observability/performance highlights, alongside a comparison with Avro. We also flag the points CCDAK (Confluent Certified Developer for Apache Kafka) candidates should master.

Why and When to Use JSON Schema on Kafka

JSON Schema lets you constrain data while keeping the human-readable JSON form. Observability (eyeballing payloads), affinity with HTTP services, and ease of use from lightweight clients are real wins. The downside is no Avro-style binary optimization, so payload size and serialization CPU cost tend to be higher.

On Kafka, JSON Schema is typically used with Confluent Schema Registry and Confluent's JSON Schema Serializer/Deserializer (KafkaJsonSchemaSerializer / KafkaJsonSchemaDeserializer). Compatibility (BACKWARD/FORWARD/FULL and their transitive forms) and subject naming strategies (TopicNameStrategy, etc.) are operated the same way as for Avro.

  • Pros: human-readable, easy to validate without special tooling, plays well with diverse clients including web and mobile
  • Cons: larger size, higher CPU load, less rich strict types (decimal, fixed, etc.) and logical types than Avro
  • Prerequisite: Schema Registry is required, and you follow the compatibility checks of the Confluent implementation (see docs.confluent.io)

Kafka x JSON Schema data flow (conceptual)

ProducerJSON Schema Serializermagic byte + schema idKafkaSchema RegistryJSON Schema DeserializerConsumerSchemas are registered in the Registry and messages carry a schema id. Compatibility mode can be set per subject.

Compatibility Modes and Schema Evolution: Safe Changes for JSON Schema

Schema Registry also provides BACKWARD / FORWARD / FULL and their transitive compatibility modes for JSON Schema. The implementation relies on Confluent's compatibility checker, evaluating safety mainly around required/optional, enum, and type changes. Note that format and some extension keywords may not affect compatibility checks.

In production, defaulting to BACKWARD_TRANSITIVE to keep existing consumers safe is a sound starting point. Because JSON Schema's additionalProperties defaults to true, set it explicitly to false to prevent unexpected fields, and enumerate any allowed additions in properties.

  • Safe addition: an optional new field (do not put it in required, or provide a default)
  • Risky change: narrowing types (e.g. number → integer), making a field required, removing enum values
  • Compatibility is set per subject. The transitive variants guarantee compatibility against all previous versions.

Setting the compatibility mode (Schema Registry REST API)

curl -s -X PUT \
  -H 'Content-Type: application/json' \
  --data '{"compatibility": "BACKWARD_TRANSITIVE"}' \
  http://localhost:8081/config/my-topic-value

# Check compatibility
curl -s http://localhost:8081/config/my-topic-value | jq

Naming Strategies, Subject Design, and Versioning

Subject names are the unit of compatibility and independent rollout. For one event type per topic, choose TopicNameStrategy; for multiple event types packed into one topic, use RecordNameStrategy or TopicRecordNameStrategy. The same choices apply to JSON Schema just as for Avro.

The versioning principle is to preserve backward compatibility; when a breaking change is unavoidable, branch off into a new subject or new topic. JSON Schema's handling of additionalProperties and required is a frequent source of inter-version diffs, so codify your team's conventions in writing to reduce incidents.

  • TopicNameStrategy: consolidate under <topic>-value / <topic>-key. Best for a single event type.
  • RecordNameStrategy: separate by <fullRecordName>. Strong for schema reuse.
  • TopicRecordNameStrategy: <topic>-<fullRecordName>. Avoids collisions on mixed-content topics.

Java Producer (using JSON Schema)

import io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient;
import io.confluent.kafka.schemaregistry.client.SchemaRegistryClient;
import io.confluent.kafka.schemaregistry.json.JsonSchema;
import io.confluent.kafka.serializers.json.KafkaJsonSchemaSerializer;
import org.apache.kafka.clients.producer.KafkaProducer;
import org.apache.kafka.clients.producer.ProducerConfig;
import org.apache.kafka.clients.producer.ProducerRecord;
import java.util.*;

public class JsonSchemaProducerExample {
  public static void main(String[] args) throws Exception {
    Properties props = new Properties();
    props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");
    props.put("schema.registry.url", "http://localhost:8081");
    props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, "org.apache.kafka.common.serialization.StringSerializer");
    props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, KafkaJsonSchemaSerializer.class.getName());
    // Subject naming strategy (example)
    props.put("value.subject.name.strategy", "io.confluent.kafka.serializers.subject.TopicNameStrategy");

    String schemaStr = "{\n" +
        "  \"$schema\": \"https://json-schema.org/draft/2020-12/schema\",\n" +
        "  \"title\": \"UserCreated\",\n" +
        "  \"type\": \"object\",\n" +
        "  \"additionalProperties\": false,\n" +
        "  \"properties\": {\n" +
        "    \"id\": {\"type\": \"string\"},\n" +
        "    \"age\": {\"type\": \"integer\"},\n" +
        "    \"email\": {\"type\": \"string\", \"format\": \"email\"}\n" +
        "  },\n" +
        "  \"required\": [\"id\"]\n" +
        "}";

    JsonSchema jsonSchema = new JsonSchema(schemaStr);
    // The serializer auto-registers on send (auto.register.schemas=true by default)

    try (KafkaProducer<String, Object> producer = new KafkaProducer<>(props)) {
      Map<String, Object> value = new HashMap<>();
      value.put("id", UUID.randomUUID().toString());
      value.put("age", 34);
      value.put("email", "[email protected]");

      ProducerRecord<String, Object> record = new ProducerRecord<>("user-events", value);
      producer.send(record).get();
    }
  }
}

Performance and Observability: The Real Costs of Size, CPU, and Validation

Because JSON is text, message size is larger than sending the equivalent record as Avro binary. Measure the impact on network and storage costs, broker throughput, and client GC pressure up front. Compression (lz4, zstd) offsets this to some extent.

On the upside, troubleshooting is easier. You can observe messages directly in logs or via CLI and quickly look up the schema by its Schema Registry id. Validation cost depends on whether you validate on the producer, the consumer, or both. Decide where strictness lives: reject hard on the producer, or accept leniently on the consumer.

  • Size optimization: omit optional fields when not used, do not over-abbreviate field names (balance with readability)
  • CPU optimization: consolidate validation on either producer or consumer, and enable batched sends
  • Monitoring: visualize schema-id distribution, deserialization failure rate, and record size p95/p99

Comparison with Avro and When to Use Each

Both share Schema Registry and the compatibility-mode concepts, but they differ in type expression, encoding, and validation characteristics. Prefer JSON Schema for readability and polyglot integration; prefer Avro for strict types and efficiency. You can mix them, but operations are easier when standardized per topic.

Avro is more mature for logical types (decimal, timestamp, etc.) and default-value handling, and for long-term data-platform maintenance Avro often wins. Conversely, JSON Schema is great for API edges, PoCs, and early-stage event collection where schemas are still being explored.

  • High throughput with a single event type: prefer Avro
  • Diverse clients / visibility-first / close to APIs: prefer JSON Schema
  • Long-term archive / DWH integration: Avro (advantages in logical types and size)
AspectJSON SchemaAvro
EncodingText JSON (readable)Compact binary
Type strictnessLoose (format may not affect compatibility checks)Strict (rich logical types)
Size / performanceRelatively large (mitigated by compression)Tends to be small and fast
Compatibility modesUse the same modes in the Registry as AvroFirst-class support in the Registry
Tooling / visibilityEasy to read directly from CLI / logsHard to read without decoding
Schema referencesSupported via the Registry's reference feature (implementation-dependent)Supported (highly mature)

CCDAK Exam Prep and Operational Pitfalls

On CCDAK, Schema Registry compatibility modes, naming strategies, and whether a given schema evolution is allowed come up frequently. Being able to articulate how JSON Schema's additionalProperties and required semantics differ from Avro's gives you an edge.

Real-world pitfalls are underestimating compatibility modes and letting subjects proliferate. Casually adding a new event to an existing subject with the wrong required can cause registration to fail with sudden compatibility violations. Build schema validation into CI to catch failures early.

  • Remember: differences between BACKWARD/FORWARD/FULL and their transitive forms, and when each of the 3 naming strategies applies
  • Watch out: JSON Schema's additionalProperties defaults to true. Setting it explicitly to false is recommended.
  • Exam angle: how to handle breaking changes, and choosing a strategy when packing multiple record types into one topic

Check Your Understanding

CCDAK

問題 1

You want to add a new optional field 'address' to a Kafka topic that uses JSON Schema. Which combination evolves the schema safely without breaking existing consumers?

  1. Compatibility: BACKWARD_TRANSITIVE. Do not include address in required, keep additionalProperties as false, and define it in properties.
  2. Compatibility: FORWARD. Add address to required with no default.
  3. Compatibility: NONE. Add address to properties and change additionalProperties to true.
  4. Compatibility: FULL_TRANSITIVE. Add address to required and set format to postal-address.

正解: A

BACKWARD (TRANSITIVE recommended) is appropriate for protecting existing consumers. Keep new fields optional, leave additionalProperties as false, and enumerate them explicitly in properties. Putting them in required risks breaking backward compatibility, and the presence or absence of format may not affect compatibility checks.

Frequently Asked Questions

Is Schema Registry required to use JSON Schema?

Schema Registry is a prerequisite for schema management, compatibility checks, and schema-id-based deserialization on Kafka. Embedding schemas yourself is technically possible but not recommended for operations, evolution, or interoperability.

Can I use schema references ($ref) with JSON Schema?

Schema Registry supports schema references for Avro, JSON Schema, and Protobuf. JSON Schema $ref is supported, but the scope of compatibility checks and resolution behavior depend on the implementation, so check the official docs for constraints.

Do default values and format affect compatibility checks?

Compatibility checks are primarily based on structural changes (required, type, enum, etc.). format and some annotations may not influence compatibility. Default-value handling also differs from Avro, so validate changes against the Registry beforehand (/compatibility/subjects/... API).

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
Kafka

Kafka Topics & Partitions: Distribution Fundamentals (2026)

How Kafka topics and partitions enable scale — ordering guar...

Kafka

CCDAK Exam Guide: Confluent Certified Developer (2026)

Complete prep for the CCDAK exam — Producer/Consumer API, St...

Kafka

CCAAK Exam Guide: Confluent Certified Administrator (2026)

Pass the CCAAK exam — cluster management, partitions, securi...

Kafka

Kafka Replicas & ISR: Fault Tolerance Explained (2026)

Replica placement, in-sync replicas (ISR), leader election. ...

Kafka

Kafka Offsets: Commit Modes & Consumer Position (2026)

Offset semantics — auto vs. manual commit, __consumer_offset...

Browse all Kafka articles (101)
© 2026 NicheeLab All rights reserved.