Back to archive

Engineering

Idempotency Is a Product Requirement

Why safe retries depend on product semantics, user-visible effects, and organizational ownership.

Idempotency Is a Product Requirement

Engineers often talk about idempotency as an implementation detail: keys, retries, deduplication tables, and safe handlers. Those details matter. But the deeper question is not technical.

The deeper question is what the product means by "the same action."

The thesis

Idempotency is a product requirement before it is an engineering technique.

You cannot make retries safe until the organization agrees which effects are unique, which are repeatable, which are visible to users, and who owns the ambiguity.

The production pattern

A user or system submits an action. The network times out, a worker crashes, a callback is delayed, or a caller retries because it did not receive confirmation. The same intent may arrive more than once.

The engineering instinct is to add an idempotency key. That is often necessary, but it is not sufficient. The system still has to answer product questions:

  • Is this the same purchase, request, message, report, or state change?
  • What if the second attempt has slightly different metadata?
  • What if the first attempt partially succeeded?
  • What should the user see while the outcome is unknown?
  • How long should the system remember the original intent?

Without product semantics, idempotency becomes a cache with opinions.

The model

I divide idempotency into five layers.

First, intent identity. What identifies the user's intended action? It may not be the same as a request ID. A request ID can identify delivery. An idempotency key should identify intent.

Second, effect boundary. Which side effects belong to the action? Database writes, notifications, external calls, analytics events, and user-visible state may need different treatment.

Third, response semantics. If the same intent is repeated, should the user receive the original response, the current state, or a specific "already accepted" message?

Fourth, retention window. How long can duplicates arrive? The answer depends on callers, queues, integrations, and human behavior.

Fifth, ownership. Who resolves conflicts when a repeated action is similar but not identical? Engineering cannot answer that alone.

My review checklist:

  • The idempotency key maps to product intent
  • Duplicate attempts return a defined user-visible result
  • Partial success is represented in durable state
  • Side effects are either deduplicated or explicitly repeatable
  • Retention matches realistic retry windows
  • Conflict cases have an owner and policy

I also ask teams to define the conflict matrix. Same key and same payload is the easy case. Same key and different payload is where product semantics show up. Different key and same apparent intent can happen when a user refreshes, a mobile app retries after reinstall, or an integration generates a new delivery identifier.

The matrix should state whether the system returns the original result, rejects the attempt, creates a new action, or routes the case to review. Without that policy, engineers bury product judgment inside whichever conditional happens to be easiest to implement.

The storage model matters too. A dedupe table that expires too quickly creates false confidence. A table that stores too much sensitive data creates its own risk. A unique constraint on the wrong field protects the database while still allowing duplicate user-visible effects. Idempotency is only real when the protected boundary matches what the user experiences as one action.

The hardest reviews are the ones where two stakeholders both have reasonable meanings for "same." In that case, I prefer to name the ambiguity directly and choose the user-facing promise first. The implementation can optimize later, but the promise determines which duplicate is safe and which duplicate is harm.

Where this goes wrong

The counterpoint is that not every action needs strong idempotency. Some workflows are naturally append-only, low consequence, or easy for users to correct. Over-engineering idempotency can add storage, contention, and confusing edge cases.

There is also a danger in pretending idempotency guarantees exactly-once execution. It usually does not. Most production systems can provide at-least-once processing with deduplicated effects, or exactly-once appearance within a carefully bounded domain. That distinction should be explicit.

The worst failure mode is partial idempotency: the main write is protected, but notifications, downstream calls, or derived records still duplicate. Users experience the duplicate side effect, so the system is not idempotent in the way that matters.

What I do now

When reviewing retry behavior, I ask product and engineering to define the action in plain language. "Submitting this form creates one active request for this user and this target." "Retrying returns the current status of that request." "Changed details after acceptance require a new amendment." Those sentences are the foundation.

Then the implementation can follow: keys, unique constraints, state machines, outbox patterns, dedupe tables, and reconciliation. The technical design becomes much easier once the semantic design is honest.

The principal-engineer lens is user trust. Duplicate effects are rarely perceived as distributed systems trivia. They feel like broken promises.

Closing takeaway

Do not start idempotency design with a key. Start by defining what one user intent means, which effects count, and what repeated attempts must promise.