Kafka

Avro Schema Design: Field Additions, Defaults, Compatibility, and CCDAK Prep

2026-04-19
NicheeLab Editorial Team

If you use Avro with Kafka, getting compatibility modes and default values wrong will bite you. This article maps field additions, removals, type changes, and aliases (renames) onto Schema Registry compatibility rules.

CCDAK frequently tests the difference between Backward/Forward/Full and their Transitive variants, when defaults are required, and the ordering of unions that include null. Cover the design rules you need on the job and the exam takeaways together.

Compatibility Modes: What They Mean and How to Pick One

Avro compatibility is defined by the resolution rules used when a reader schema interprets data written with a writer schema. Whenever you register a new schema, Confluent Schema Registry validates it against existing versions using the selected compatibility mode.

In practice, Backward compatibility is the most common choice — it matches the workflow of upgrading consumers before producers. Pick Forward when producers need to roll out first, or Full when you need bidirectional safety. Transitive variants tighten validation so that every past version, not just the previous one, is checked.

  • Backward: guarantees the new reader schema can read data written by old writers
  • Forward: guarantees old reader schemas can read data written by the new writer
  • Full: satisfies both Backward and Forward at the same time
  • Transitive variants: validate against every past version, not just the previous one
  • Compatibility is configured per subject (for example, <topic>-value)
Compatibility modeImpact on existing consumersImpact on existing producersTypical allowed changes
BACKWARDConsumers using the new schema can still read old dataExisting producers keep working unchangedField additions (default required), type promotions (e.g. int→long), and renames via aliases
FORWARDOld consumers can read data written with the new schemaTighter constraints on the new producerField removal (provided the old reader has a default) and a limited set of forward-safe edits
FULLReadable in both directions, old ⇔ newStrict constraints on both old and new sidesAdditions need defaults, type changes must be promotions only, and renames require aliases
NONENo validation — breaking changes pass throughNot recommended outside of short experiments
TRANSITIVE variantsGuarantee compatibility against every past versionEnsures the release chain never breaks at any pointAvailable as BACKWARD_TRANSITIVE / FORWARD_TRANSITIVE / FULL_TRANSITIVE

Reading and writing Avro through Schema Registry

register / get IDrecord + schema IDrecord + schema IDfetch writer schema by IDProducer(writer)Schema RegistryIDs, compatibilityKafka brokerConsumer(reader)Reader schema resolutionThe writer schema ID is embedded in each message. The consumer fetches the writer schema by ID and resolves it against its own reader schema.

Adding Fields: Default Value Design

The most common change is adding a field. To preserve Backward compatibility, every added field must declare a default value. Since old data has no value for the new field, the reader falls back to that default.

To express "optional," include null in the union, put null first in the union array, and set the default to null. Per the Avro spec, the default of a union must match the first type listed in the array.

Defaults for complex types are heavily constrained — record and array defaults must be an empty collection or a fully spec-compliant value. When you can, make the field nullable and default it to null, which is much easier to maintain.

  • Always attach a default to added fields (Backward compatibility)
  • Optional fields use ["null", "type"] order with "default": null
  • Do not default numeric fields to zero out of convenience — missing and zero usually mean different things in the business
  • For larger structures, start nullable with a null default and consider making them required later

Pitfalls of Removing or Tightening Fields

Removing a field is dangerous from a Forward-compatibility standpoint. Once the new writer stops emitting the field, old readers fail to resolve unless their reader schema already had a default. In practice you cannot patch the old reader schema after the fact, so avoid straight deletions.

The safe play is gradual deprecation: first switch the field to nullable with default null, upgrade every consumer, and only then consider removing it in a future major version. The same applies to tightening — any design that cannot fall back to nullable is a breaking change.

  • Before deleting: switch to nullable + default null → update every consumer → monitor → delete later
  • Forward compatibility breaks the moment an old consumer expects a field that has no default
  • Making a field required is a last resort — consider whether input validation can do the job instead

Type Changes, Renames, and Aliases

Only type promotion is safe. The classic chain is int→long→float→double. Avro does not auto-promote between string and bytes, so use a union that holds both if you need to evolve in that direction.

For renames, use aliases. When you change a field or record name, listing the old name in aliases lets the resolver map the old field onto the new one — a Backward-compatible rename.

Adding a new enum symbol is Backward compatible (new readers can read old data) but unsafe for Forward compatibility (old readers cannot interpret the unknown symbol).

  • Safe type promotions: int→long, long→double, float→double
  • Dangerous: boolean→int, swapping string and bytes, or reducing decimal precision
  • For renames, add aliases: ["old_name"] on the new schema
  • Enum additions are Backward OK and Forward NG; removals flip the direction and are equally risky

Record Evolution by Example

A typical evolution combines an added field (with a default), a type promotion, and a rename via aliases. Set compatibility to Backward and roll out consumers before producers.

Below is a v1→v2 evolution example. v2 preserves Backward compatibility by making the added field nullable with default null, promoting the numeric type, and using aliases for the rename.

  • The Schema Registry subject is typically <topic>-value
  • Configure compatibility per subject and treat it as immutable once you go live
  • Declare aliases on the new schema to absorb the old name

Avro schema evolution and compatibility settings

# v1: 初期スキーマ
{
  "type": "record",
  "name": "Order",
  "namespace": "nl.shop",
  "fields": [
    {"name": "id", "type": "string"},
    {"name": "amount", "type": "int"},
    {"name": "status", "type": {"type": "enum", "name": "OrderStatus", "symbols": ["NEW", "PAID"]}}
  ]
}

# v2: 追加・昇格・リネーム (Backward 互換)
{
  "type": "record",
  "name": "Order",
  "namespace": "nl.shop",
  "fields": [
    {"name": "order_id", "type": "string", "aliases": ["id"]},
    {"name": "amount", "type": "long"},
    {"name": "status", "type": {"type": "enum", "name": "OrderStatus", "symbols": ["NEW", "PAID", "CANCELLED"]}},
    {"name": "note", "type": ["null", "string"], "default": null}
  ]
}

# ポイント
# - amount: int->long は型昇格
# - order_id: フィールド名の変更だが、aliases で旧名 id を吸収
# - note: ["null", "string"] の順で default は null
# - enum 追加(CANCELLED): Backward は OK。Forward(旧リーダー)は新シンボルを解せない点に注意

# Schema Registry: subject の互換性を Backward に設定 (例: <topic>-value)
# API パスは実環境に合わせる
curl -X PUT \
  -H "Content-Type: application/json" \
  --data '{"compatibility": "BACKWARD"}' \
  http://schema-registry:8081/config/my-orders-value

Serializer and Registry Settings in Production

Confluent's Avro serializer/deserializer embeds a schema ID in each message and works with the Registry to fetch the writer schema. With SpecificRecord, the reader schema comes from the generated class; with GenericRecord, the deserializer interprets data using the writer schema as-is.

In production it is safer to turn off auto.register.schemas and handle registration and review through a separate pipeline. TopicNameStrategy is the typical subject naming strategy, but consider RecordNameStrategy when records are heavily reused across topics.

  • schema.registry.url is mandatory
  • Default to auto.register.schemas=false and pre-register with compatibility checks in CI
  • value.subject.name.strategy: TopicNameStrategy / RecordNameStrategy / TopicRecordNameStrategy
  • Set specific.avro.reader=true to use SpecificRecord as the reader schema
  • Do not casually enable use.latest.version — always run compatibility and consistency checks first

CCDAK Checklist

On the exam, scoring well comes down to memorising the definitions in matched sets. Questions typically describe a concrete schema change and ask which compatibility modes allow it and what default it requires.

  • Backward: field additions require a default; the union default must match the first type
  • Forward: field removal tends to fail unless the old reader already has a default
  • Full: must satisfy both directions at once — the strictest mode
  • Only type promotion is safe (int→long→double); boolean and string substitutions are not allowed
  • For renames, add aliases on the new schema
  • Enum additions are Backward OK and Forward NG

Check Your Understanding

CCDAK

問題 1

The value-side subject is configured with BACKWARD compatibility. You want to add a new optional field note to the existing Order record. Which change is the safest?

  1. Add note as type string with no default
  2. Add note as type ["null", "string"] with default null
  3. Add note as type ["string", "null"] with default ""
  4. Add note as type ["null", {"type":"record",...}] with default {}

正解: B

Under Backward compatibility, any added field must have a default. To make it optional, the union must start with null and the default must be null as well — so ["null", "string"] with default=null is the correct choice.

Frequently Asked Questions

Which branch of a union should the default value match?

The Avro spec requires the union default to match the first type in the array. For optional fields, order the union as ["null", "type"] and set the default to null.

How far can you safely change a field's type?

Only type promotion is safe — typical examples are int→long, long→double, and float→double. Changes like string↔bytes or boolean→int are not allowed, and you should avoid logical-type regressions such as reducing decimal precision.

How do enum symbol additions and removals affect compatibility?

Adding a symbol is Backward compatible (new readers can read old data) but breaks Forward compatibility (old readers cannot interpret the new symbol). Removal is the opposite direction and even riskier, so it should generally be avoided.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
Kafka

Kafka Topics & Partitions: Distribution Fundamentals (2026)

How Kafka topics and partitions enable scale — ordering guar...

Kafka

CCDAK Exam Guide: Confluent Certified Developer (2026)

Complete prep for the CCDAK exam — Producer/Consumer API, St...

Kafka

CCAAK Exam Guide: Confluent Certified Administrator (2026)

Pass the CCAAK exam — cluster management, partitions, securi...

Kafka

Kafka Replicas & ISR: Fault Tolerance Explained (2026)

Replica placement, in-sync replicas (ISR), leader election. ...

Kafka

Kafka Offsets: Commit Modes & Consumer Position (2026)

Offset semantics — auto vs. manual commit, __consumer_offset...

Browse all Kafka articles (101)
© 2026 NicheeLab All rights reserved.