Kafka

Kafka x Protobuf Schema Design: Field Numbers and Compatibility in Practice

2026-04-19
NicheeLab Editorial Team

The biggest reasons to adopt Protobuf on Kafka are its compact wire format, strong typing, and ease of schema evolution. The prerequisite, however, is handling field numbers correctly.

This article assumes a typical implementation using Confluent Schema Registry, and walks through safe operation of field numbers and compatibility along the CCDAK exam scope with concrete examples.

Why Use Protobuf on Kafka, and Wire-Format Fundamentals

Confluent's Protobuf serializer prepends a magic byte and schema ID to the message payload and integrates with Schema Registry to make schema evolution safe. Consumers fetch the schema by ID and interpret the byte stream.

On the wire, Protobuf identifies fields by number. Because the numbers carry meaning, reusing or reassigning them breaks backward compatibility. Understand that the substance (number and type) matters more than the appearance (field name).

  • Use Schema Registry compatibility checks to catch breaking changes before they hit production
  • Unknown fields are ignored by older clients in Protobuf, so additions can be made safely with proper design
  • The wire format is compact; assigning small numbers to frequently used fields improves encoding efficiency

Relationship between Kafka, Protobuf, and Schema Registry

MagicByte + SchemaID + Protobuf bytesLookup / RegisterProducer App(Protobuf Ser.)Consumer App(Protobuf Deser.)KafkaTopicSchema Registry(Schemas)

Design Principles for Field Numbers

Protobuf field numbers range from 1 to 536,870,911, with 19000-19999 reserved internally and unavailable. Numbers are the core of wire compatibility, and the iron rule is: once released, never change and never reuse them.

Numbers differ in encoding cost. Tags 1-15 are shorter on the wire, so assigning them to high-frequency or near-required fields is efficient. Reserve number ranges by logical block or team in advance, and operations stay stable as the schema grows.

  • Never reuse a number once released (mark it reserved when deleting)
  • Prioritize numbers 1-15 for high-frequency fields
  • Reserve unused numbers up front for future extensions (e.g., 100-199 for extensions)
  • Do not use 19000-19999 (reserved by Protobuf)
  • Express type changes or moves into a oneof as new fields with new numbers; deprecate the old number, then reserve it

.proto example: reservation and addition pattern

syntax = "proto3";
package com.example;

message OrderV1 {
  // High-frequency fields get small numbers
  int64 order_id = 1;
  string customer_id = 2;
  // Future status code
  int32 status_code = 10;
}

// After evolution
message OrderV2 {
  int64 order_id = 1;
  string customer_id = 2;
  // We want to replace status_code with an enum -> add with a new number
  // Old number 10 is deprecated -> reserve it so it cannot be reused
  reserved 10; // old: status_code
  // You can also reserve the name
  reserved "status_code";

  // New enum-based status. Old clients ignore it as unknown
  OrderStatus status = 20;

  // When removing a field: [deprecated = true] -> later move to reserved
  string note = 30 [deprecated = true];
}

enum OrderStatus {
  ORDER_STATUS_UNSPECIFIED = 0; // 0 is the default
  ORDER_STATUS_PLACED = 1;
  ORDER_STATUS_SHIPPED = 2;
  ORDER_STATUS_CANCELLED = 3;
}

Compatibility Modes and Safe Evolution Patterns

Schema Registry has three main compatibility modes: Backward, Forward, and Full. In Protobuf, adding a new field (with a new number) is backward-compatible in most cases. Older consumers ignore unknown fields, so existing reads continue to work.

Breaking changes include renumbering or reusing numbers, type changes, moving an existing field into a oneof, and changing enum numbers. Renaming may be valid on the wire, but it tends to break code generation and validation, so in practice treat it as breaking.

  • Add with a new number; keep existing numbers as they are
  • Delete in stages: deprecate -> stop using -> mark reserved
  • When changing a type, introduce a new field with a new number and run them in parallel for a transition period
  • For enums, adding values is allowed; changing numbers is not
ModeAllowed change examplesRepresentative failure cases
BackwardAdd a new optional-equivalent field with a new number; update doc comments on existing fieldsRenumbering or reusing existing field numbers; changing scalar types (e.g., int32 -> string)
ForwardDeprecate and effectively remove a field that is no longer used (new consumers can handle it)Removing information that old messages effectively require (new -> old becomes unreadable)
FullMinor renames and comment tweaks; adding a completely independent new fieldRenumbering or reusing numbers; moving an existing field into a oneof; changing enum numbers

Operational Procedure for Deletion, Reuse, and Rename

Drive deletions and replacements in stages. Removing a field outright breaks compilation in older apps or fails compatibility checks. The safe order is: deprecate flag -> stop usage -> mark reserved.

Reserve both the number and the name to prevent accidental reuse later. Manage schema history alongside Schema Registry and bake compatibility checks into PR reviews to cut down on incidents.

  • Step 1: mark the target field as deprecated and announce its retirement
  • Step 2: wait until producers have fully migrated to the new field (parallel-run period)
  • Step 3: remove the old field from .proto and register both number and name as reserved
  • Step 4: pin compatibility on Schema Registry to Full (or your org standard) to block breaking changes

Standard pattern when deleting in .proto

message CustomerV2 {
  int64 id = 1;
  string email = 2;
  // Old phone is retired -> deleted and reserved
  reserved 5;           // forbid number reuse
  reserved "phone";     // forbid name reuse
}

Compatibility Pitfalls for oneof, map, and enum

Adding a field to a oneof leaves older clients treating that branch as unknown, which can fail to honor the intended semantics. Even if Schema Registry validation passes, it tends to be breaking at the application level.

A map is internally converted into a repeated message. Changing key or value types is breaking. For enums, adding values is safe, but changing or reassigning value numbers is breaking. Renames may be allowed on the wire, but avoid them as a rule for the sake of generated code and readability.

  • oneof: avoid moving an existing field into a oneof (breaking)
  • map: do not change key types, and do not change value types; replace with a new field if needed
  • enum: adding values is allowed, changing or reusing numbers is not; when deleting, treat the number as reserved-equivalent and forbid reuse
  • Use renames as a last resort; even when wire-compatible, they can break toolchain compatibility

Operational Settings and CCDAK Checkpoints

Subject Naming Strategy directly drives your reuse strategy. Pick among TopicNameStrategy (default, <topic>-value), RecordNameStrategy (share by message type), and TopicRecordNameStrategy (topic x type) depending on requirements.

Use io.confluent.kafka.serializers.protobuf.KafkaProtobufSerializer and KafkaProtobufDeserializer respectively. To have the deserializer return a generated class, set specific.protobuf.value.type.

Compatibility level can be set globally or per subject. A safe baseline is Backward in dev and Full in production, blocking breaking changes operationally.

  • CCDAK check: be able to answer instantly on the meaning of each compatibility mode and examples of allowed/disallowed changes
  • Be able to choose a Subject Naming Strategy by case (schema reuse vs. separation)
  • Be able to explain Producer/Consumer Protobuf SerDe configuration and Schema Registry integration

Example Schema Registry and client configuration

# Producer/Consumer properties (excerpt)
key.serializer=org.apache.kafka.common.serialization.StringSerializer
value.serializer=io.confluent.kafka.serializers.protobuf.KafkaProtobufSerializer
schema.registry.url=http://localhost:8081
# When you want a specific type back (example)
specific.protobuf.value.type=com.example.OrderV2
# When you want to change Subject Naming Strategy (example)
value.subject.name.strategy=io.confluent.kafka.serializers.subject.RecordNameStrategy

# Compatibility mode (per-subject)
# Specify Backward/Forward/Full, etc.
curl -X PUT -H "Content-Type: application/json" \
  --data '{"compatibility": "FULL"}' \
  http://localhost:8081/config/my-orders-value

Check with a Sample Question

CCDAK

問題 1

An order event's Protobuf schema currently uses a numeric status_code (number 10). Going forward, you want to introduce an enum-typed status to make meaning explicit, with existing consumers migrating in phases. Which response is safest?

  1. Add status with a new number, mark old status_code's number 10 and name as reserved, and keep compatibility mode at Full
  2. Keep number 10 as is and only change its type from int32 to enum
  3. Rename the old field to status and swap its type later
  4. Temporarily disable compatibility checks and then reuse number 10 for the new purpose

正解: A

The safest path is to add the enum field with a new number and protect the old field by deprecating it and then marking it reserved. Reusing numbers or changing types is breaking, and disabling compatibility checks is not a recommended operational practice.

Frequently Asked Questions

Is it OK to leave gaps between field numbers?

Yes, it is fine. Reserving headroom for future extensions or team-by-team splits is a sound practice. In particular, since numbers 1-15 are scarce, prioritize allocating those and leave the next block open for future use.

Can I change just the field type while keeping the same field number?

No. The field number is the field's identity, so changing its type is a breaking change. Add a new field with a new number and gradually retire the old one.

Can I use Protobuf without a Schema Registry?

Technically yes, but you lose automatic compatibility checks and centralized schema distribution, which drives operational overhead through the roof. For CCDAK and production use, design with Schema Registry as a prerequisite.

Check what you learned with practice questions

Practice with certification-focused question sets

無料で問題を解いてみる
Author

NicheeLab Editorial Team

NicheeLab editorial team focused on data engineering and cloud certification learning. Content is structured around practical study needs and official exam domains.


Related articles
Kafka

Kafka Topics & Partitions: Distribution Fundamentals (2026)

How Kafka topics and partitions enable scale — ordering guar...

Kafka

CCDAK Exam Guide: Confluent Certified Developer (2026)

Complete prep for the CCDAK exam — Producer/Consumer API, St...

Kafka

CCAAK Exam Guide: Confluent Certified Administrator (2026)

Pass the CCAAK exam — cluster management, partitions, securi...

Kafka

Kafka Replicas & ISR: Fault Tolerance Explained (2026)

Replica placement, in-sync replicas (ISR), leader election. ...

Kafka

Kafka Offsets: Commit Modes & Consumer Position (2026)

Offset semantics — auto vs. manual commit, __consumer_offset...

Browse all Kafka articles (101)
© 2026 NicheeLab All rights reserved.