Feature Flags Are Distributed Systems in Disguise

Feature flags look harmless because the interface is usually a boolean. The risk is hidden by that simplicity. A flag is not just an if statement. It is a distributed control plane for behavior, and it inherits many of the same failure modes as any other distributed system.

I trust flags. I also distrust casual flag culture.

The thesis

Feature flags reduce deployment risk only when their lifecycle is engineered. Without ownership, consistency expectations, cleanup rules, and failure semantics, flags trade one kind of release risk for a quieter operational risk.

The dangerous part is that the new risk often appears later, after the release celebration, when nobody is paying attention.

The production pattern

The pattern is easy to recognize. A team adds a flag to ship safely. Then another flag gates a related behavior. A third flag handles a partial rollout. A fourth flag is introduced for an emergency off switch. A few months later, nobody can answer which combinations are valid. Test coverage follows the happy path. Observability labels include the flag sometimes. The default value differs between environments. One flag is owned by a team that no longer touches the feature.

Nothing about this requires a dramatic outage. The system simply becomes harder to reason about. A new engineer cannot tell whether a branch is dead code, future code, or a production escape hatch. A product decision becomes tangled with a deployment mechanism. A debugging session starts with "which flags were on?" and loses half an hour.

The model

I classify flags into five types.

Release flags decouple deploy from launch. They should be short-lived and removed after rollout.

Experiment flags compare behavior. They need clear analysis windows, assignment semantics, and cleanup dates.

Permission flags expose capability to cohorts. They often become policy and deserve stronger ownership than release flags.

Operational flags protect reliability. They should be boring, documented, observable, and tested in both positions.

Migration flags move traffic, data, or behavior between implementations. They need rollback plans, reconciliation, and compatibility checks.

The type matters because each flag has different lifecycle rules. Treating all flags as the same boolean is how systems accumulate accidental state machines.

For every non-trivial flag, I want six answers:

What type of flag is this?
Who owns the decision to change it?
What is the safe default if the flag service is unavailable?
Which flag combinations are invalid?
How will we observe behavior by flag state?
When and how will the flag be removed?

If those answers feel heavy, the flag may be too casual for the risk it carries.

Where this goes wrong

The counterpoint is real: a strict flag process can make delivery slower than direct deployment. For low-risk cosmetic work, internal-only tools, or tiny services with fast rollback, a lightweight flag may be enough. The cure should not be worse than the disease.

The mistake is applying the same process to every flag. I do not want a review board for a small UI copy toggle. I do want rigor around flags that alter money movement, authorization, data writes, billing, irreversible user-visible effects, or cross-service contracts.

The useful distinction is reversibility. A flag that changes presentation can usually be flipped back and forgotten. A flag that changes durable state creates history. Once a flag writes different data, sends different events, or exposes a new permission shape, rollback is no longer just returning to the old branch. It is also reconciling what happened while the new branch was active.

Flags also go wrong when they become a substitute for design. Sometimes a flag is the right tool. Sometimes it is a way to postpone deciding how two behaviors should coexist. If both branches need to remain alive indefinitely, that is no longer a release flag. It is product variation, policy, or architecture.

What I do now

I ask teams to put flag lifecycle in the same design as rollout. The design does not need to be long. A small table is often enough: name, type, owner, default, valid states, observability, removal trigger.

For high-risk flags, I want tests for both states and for the transition between states. I want the flag state attached to events or traces where it will matter during debugging. I want runbooks to say whether flipping the flag is safe during partial failure. I want stale flags reviewed on a schedule, because stale flags are production behavior without active ownership.

I also prefer flags that fail closed or fail boringly. If a flag controls enrichment, the safe default may be off. If it controls a migration where old and new systems must stay compatible, the safe default may depend on the last known committed phase. The default is a product and reliability decision, not a library detail.

Closing takeaway

Use feature flags as controlled behavior distribution, not as scattered conditionals. Every serious flag needs a type, an owner, a failure mode, observability, and an exit plan.