Derived Data Needs a Product Owner

A materialized view without an owner becomes a future argument. At first it is helpful: faster reads, simpler queries, cleaner screens, better reports, cheaper recommendations. Then it starts answering questions that matter.

The danger is not that the data was derived. Most useful systems derive data constantly. The danger is that nobody owns the meaning after generation.

The thesis

Derived data becomes product state the moment people make decisions from it. At that point it needs ownership of meaning, freshness, repair, and auditability, not just a job that refreshes it.

If nobody can say what the derived value means, how stale it may be, how to rebuild it, and who approves a semantic change, the system is carrying hidden product risk.

The production pattern

A team creates a projection because the source model is inconvenient. Maybe the product needs a fast profile summary, a denormalized search document, a ranking table, an eligibility flag, an aggregate balance, a lifecycle view, or a reporting snapshot.

The first version is usually clear. There is a source. There is a transformation. There is a consumer. The derived value exists to make one experience possible.

Then the view becomes useful. Another screen reads it. A batch job depends on it. Support uses it during investigation. A model uses it as a feature. A dashboard treats it as truth. A compliance workflow exports it. Slowly, the derived value becomes more central than the source that created it.

The original owner may still own the code path, but not the decisions now attached to the value. When the source changes, every consumer asks a different question: is this a bug, a delay, a semantic change, or expected behavior?

The trap

The trap is saying "it is just derived."

That phrase is often used to avoid ownership. If the value is wrong, rebuild it. If it is stale, wait for the next run. If the definition changed, update the transformation. If consumers disagree, point them to the source.

That works for disposable caches. It fails when the derived value is visible, decision-bearing, or hard to reconstruct. A stale eligibility flag can block a user. A delayed aggregate can mislead an operator. A search projection with old permissions can expose the wrong result. A ranking feature with changed semantics can make a model look broken.

The second trap is owning the pipeline but not the meaning. A team may reliably run the job and still be unable to answer whether the output is correct for the product decision it supports.

The model

I use a five-part review for derived data.

Source of truth: what record has authority when there is disagreement? This should name the domain source, not merely the table or topic. If two sources can disagree, the review must say which one wins and why.

Derivation rule: what transformation creates the value? The rule should describe joins, filters, aggregation grain, null handling, deduplication, late data behavior, and semantic definitions. "Computed from events" is not a rule.

Freshness promise: how old may the value be before it becomes wrong for its use? A nightly report, an interactive permission check, and a user-facing count do not need the same promise. Freshness belongs to the product decision, not the pipeline schedule.

Repair path: how does the system correct bad derived state? The answer may be replay, backfill, targeted recompute, invalidation, dual computation, or manual review. The repair path should be safe to run under production constraints, not only from a developer laptop.

Auditability: can we explain how a value came to exist? Critical derived data needs lineage: input version, transformation version, computation time, source window, and enough traceability to debug disputes.

This model forces a naming decision: who owns the derived value as a product fact? The owner may be the source domain, the consuming product area, or a platform team with a clear contract. What does not work is ownership disappearing between them.

Where this model breaks

Not every projection needs product ownership. Some derived views are disposable caches. If the value can be dropped, rebuilt, and temporarily missed without changing user-visible truth, a lighter model is fine.

The distinction I use is decision weight. If nobody makes a consequential product, operational, financial, permission, or support decision from the value, it can stay closer to infrastructure. If people do make decisions from it, the ownership model must rise to match.

There is also a risk of over-documenting derived data that changes frequently during exploration. Early analytics, experiments, and internal prototypes need room to move. The right control is to prevent exploratory projections from quietly becoming production truth without a promotion review.

What I do now

When derived data appears in a design, I ask, "who gets paged for wrong meaning?" Not only wrong computation. Wrong meaning.

I also ask teams to mark the state class. Cache means performance optimization and safe invalidation. Projection means a read model with expected lag and rebuild rules. Materialized product state means a value users or systems rely on for decisions. Those classes deserve different rigor.

For important derived values, I want the contract written near the interface, not hidden inside the job. The contract should name source authority, freshness, rebuild method, consumer expectations, and migration policy. If a downstream team cannot tell whether missing means false, unknown, not applicable, delayed, or filtered out, the contract is incomplete.

I prefer repair paths that are boring to operate. A recompute should be scoped, checkpointed, observable, and idempotent where possible. A full rebuild may be necessary, but it should not be the only tool for every dispute.

The principal-engineer lens is ownership. Derived data crosses boundaries because it is useful. The more useful it becomes, the more likely it is to outgrow the team that first created it. Senior engineering work is catching that transition before a dashboard, model, or product flow turns an unowned projection into institutional truth.

Closing takeaway

Treat derived data according to the decisions it supports: cache it lightly, contract projections clearly, and assign product ownership to materialized truth.