Back to archive

Engineering

The Architecture Review That Actually Works

A review format that tests assumptions, failure modes, and operating ownership instead of style.

The Architecture Review That Actually Works

Many architecture reviews are too late, too broad, and too focused on artifacts that are easy to argue about. Diagrams get polished. Naming gets debated. People perform confidence. The review ends with comments, but the hardest risks remain unchanged.

A useful architecture review is not a design beauty contest. It is a stress test for assumptions.

The thesis

Architecture review works when it tests whether the system can be built, operated, changed, and recovered by the organization that actually exists.

That means the review must examine failure modes, ownership, sequencing, and reversibility with the same seriousness as components and interfaces.

The production pattern

A team brings a design to review after weeks of work. The document describes the happy path well. It includes a diagram, API shape, storage choice, and rollout plan. Reviewers ask questions. Some are useful. Some are preference. The team leaves with a list of comments and a vague sense that the design is approved.

Then implementation starts and the real problems emerge. A migration cannot be rolled back. A dependency has no owner. A queue hides ordering requirements. A dashboard shows symptoms but not causality. A team that was assumed to adopt the interface has different priorities. A fallback path was never tested.

The architecture review did not fail because people were careless. It failed because it reviewed the picture, not the operating reality.

The model

I use an architecture review format built around seven questions.

What decision is being made? A review without a decision becomes a discussion club. Is the team choosing between options, seeking risk review, requesting approval to proceed, or asking for input while the design is still fluid?

What are the non-negotiable constraints? These can include correctness, latency, privacy, cost, compatibility, staffing, regulatory boundaries, migration windows, or user promises. If constraints are not explicit, reviewers will invent their own.

What are the rejected alternatives? A design is easier to trust when I can see what it chose not to do. Rejected options reveal the team's risk model.

What fails, and how do we know? This includes dependency failure, partial writes, retries, backpressure, stale data, overload, bad input, deploy rollback, and operator error. I care less about exhaustive doom and more about whether the team has named the most likely failure classes.

Who owns each boundary after launch? Ownership includes code, operations, data repair, dashboards, costs, support, and future changes. Shared ownership is often no ownership with better language.

How does the system evolve? Compatibility, migrations, deprecations, versioning, and cleanup should be part of architecture, not follow-up chores.

What would make us reverse or revisit this decision? Good architecture is not always permanent. It should have triggers for review when assumptions change.

Where this goes wrong

The counterpoint is that reviews can become governance theater. Too many required reviewers, too much template, and too little decision authority will train teams to treat architecture review as paperwork.

The review should be proportional to blast radius. A local refactor does not need the same process as a cross-team platform boundary. A reversible choice needs less ceremony than a data model migration that will live for years.

Another failure mode is reviewer vanity. Senior people can derail reviews by optimizing for their preferred patterns rather than the team's constraints. The reviewer job is to improve the decision, not imprint style.

What I do now

I ask for architecture reviews early enough that options still exist. If a team is emotionally committed to one design, the review becomes negotiation. Early review is less polished, but more useful.

I prefer a short decision document over a long encyclopedia. The best documents make assumptions easy to attack. They include diagrams only where diagrams clarify ownership, flow, or failure. They include rollout and rollback because architecture without sequence is incomplete.

During review, I watch for unanswered ownership. If nobody owns reconciliation, cleanup, cost, or the fallback path, the design is not ready. I also watch for unpriced dependencies: teams, services, data sources, and human processes being treated as always available.

After review, I want a decision record. Not minutes. A decision: what we chose, why, what we rejected, what must be true, who owns it, and when to revisit.

I also prefer reviews that end with one of three outcomes: proceed, proceed after named changes, or revisit with missing evidence. Ambiguous approval is dangerous because everyone leaves with a different memory. If the design has a material unresolved risk, name the owner and the evidence required. If the risk is accepted, name who accepted it and why. If the choice is reversible, say what signal will cause reversal.

This makes architecture review feel less like judgment by seniority and more like a disciplined way to buy down uncertainty. That distinction matters. Teams will bring designs earlier when they believe the review will sharpen decisions instead of merely grade them.

Closing takeaway

An architecture review works when it converts a proposed design into explicit assumptions, owned risks, sequenced decisions, and revisit triggers.