Schemas Should Tell You What Can Change
Most schema documentation answers the easiest question: what fields exist today?
That is useful, but incomplete. Production systems fail compatibility when teams know the current shape and still disagree about what is allowed to change. The dangerous part of a schema is often not the field list. It is the unspoken evolution contract around the field list.
The thesis
A durable schema should describe what exists, what it means, and what may safely evolve.
If a schema only describes the current payload, every consumer is forced to guess the future. Some will ignore unknown fields. Some will reject them. Some will treat missing, null, false, and unknown as the same state. Some will assume enums are closed forever because nobody told them otherwise.
That guessing is where compatibility debt starts.
The production pattern
A producer adds a field, widens an enum, changes a default, starts omitting a value, or reuses an old attribute for a slightly broader meaning. The schema still looks valid. Tests may pass. The producer can serialize and the consumer can parse.
Then an old reader fails on the new enum. A downstream job treats a missing value as false. A dashboard silently changes meaning. A mobile or embedded client cannot update quickly. A backfill writes new records that an older service cannot interpret during rollback.
The root cause is not always poor engineering. Often the producer and consumer had different mental models of the same schema.
One side thought the schema was a flexible envelope. The other side treated it as a closed contract. One side thought a field was descriptive. The other side built authorization, billing, workflow state, or user messaging on top of it. The field existed, but its allowed evolution was undocumented.
The trap
The trap is believing that "schema compatible" means "system compatible."
Type compatibility is necessary, but it does not protect meaning. A string can still change semantics. An optional field can become practically required. An enum can grow in a way that breaks a consumer with exhaustive branching. A timestamp can shift from capture time to processing time without changing its type.
Another trap is using optional fields as a substitute for compatibility thinking. Optional often means "may be absent," but teams rarely say absent for whom, under what condition, for how long, and with what default behavior.
The result is a schema that documents shape while leaving evolution to folklore. Folklore works inside a small group for a while. It fails when ownership spreads, release cadence differs, or data lives longer than the code that wrote it.
The model
I like classifying fields by evolution behavior.
Stable fields are part of the durable contract. Consumers may build critical behavior on them. Changing their type, meaning, default, cardinality, or lifecycle requires an explicit migration plan.
Additive fields can be introduced without requiring old consumers to move. Readers must tolerate their absence, and writers must not assume every consumer understands them. These fields are how systems grow without synchronized releases.
Deprecated fields are still present but no longer the preferred source of meaning. A deprecated field needs a replacement, an owner, adoption evidence, and removal criteria. Without those, deprecation is only a label.
Semantic fields carry business meaning beyond their type. Examples include eligibility, status, risk, tier, role, completion, priority, and visibility. These fields need definitions, not just validators. If the meaning changes, compatibility must be reviewed even when the wire shape is identical.
Experimental fields are explicitly unstable. Consumers may observe them, but should not build durable behavior on them without accepting churn. The important part is not the word experimental. It is whether the instability is visible in the contract.
This classification makes schemas more honest. It tells consumers not only what they can read, but how much weight they can put on what they read.
The compatibility checklist I use is simple:
- Unknown fields: can readers ignore fields they do not understand?
- Unknown values: can readers tolerate new enum or state values?
- Absence: does missing differ from null, false, zero, empty, or not applicable?
- Defaults: who applies defaults, and are old defaults still valid?
- Meaning: which fields carry business semantics that require review?
- Coexistence: can old and new writers operate at the same time?
- Rollback: can old code read data written by new code?
- Removal: what evidence proves a field is no longer consumed?
If a schema cannot answer these questions, the system is depending on luck during change.
Where this model breaks
This model can become too heavy for small, local, reversible data. A private table behind one deployable unit does not need the same compatibility contract as a public event, shared record, or long-lived file format.
It can also create false confidence. A field labeled stable is still unsafe if nobody tests old readers, watches consumer errors, or owns migration work. Labels help only when they change behavior.
There is another counterpoint: some systems need strict schemas precisely because tolerance hides bugs. Financial records, security decisions, audit trails, and legal records may need rejection rather than permissive parsing. In those cases the schema should still name allowed evolution, but the allowed set may be intentionally narrow.
The goal is not maximum flexibility. The goal is explicit flexibility.
What I do now
When reviewing a schema that crosses ownership boundaries, I ask for evolution notes next to the important fields. The notes can be short, but they should be concrete.
For a stable field, I want the owner and migration rule. For an additive field, I want the reader behavior when absent. For a semantic field, I want the definition and the decision owner. For a deprecated field, I want the replacement and removal evidence. For an experimental field, I want the audience and the path to either promote or delete it.
I also prefer tolerant readers and conservative writers where the contract is broad. Readers should avoid crashing on harmless additions. Writers should avoid emitting ambiguous states. Producers should track consumer adoption before removal. Consumers should make unknown handling visible, not quietly map everything to "other" if that changes product behavior.
The principal-engineer move is to convert hidden coupling into visible contract. The schema is not just a validation artifact. It is a release coordination tool.
Closing takeaway
For any schema that outlives one deploy, document what may change, who may change it, and how old readers are expected to survive.