Back to archive

Projects & Notes

Engineering Productivity Metrics Should Start With Friction

A measurement frame that starts with wait time, rework, build pain, and cognitive load.

Engineering Productivity Metrics Should Start With Friction

Engineering productivity conversations often start with output. How many changes shipped, how many tickets closed, how much code merged, how many projects completed. Output matters, but starting there creates a measurement problem: the organization sees motion before it understands the drag.

The thesis

Productivity metrics become harmful when they measure output before they measure friction.

A team can ship many small changes while spending most of its energy waiting, rebuilding, rereviewing, reexplaining, and navigating unclear ownership. Another team can produce fewer visible artifacts while removing constraints that make future delivery cheaper. If the measurement frame cannot see friction, it will reward local throughput and miss systemic waste.

The principal-engineer question is not "are engineers busy." They usually are. The better question is "where does engineering effort lose energy before it becomes reliable product change."

The production pattern

The pattern shows up when leadership wants a clearer view of engineering speed. The request is reasonable. Delivery feels uneven. Planning is noisy. Some projects take longer than expected. Teams give different explanations. The organization wants objective measures instead of anecdotes.

The first instinct is to count visible outputs. Pull requests, commits, story points, deployments, cycle time, project milestones. These can be useful signals, but they are easy to misread. A high pull-request count may mean good flow, or it may mean fragmented work. Fast merge time may mean healthy review, or it may mean weak review. More deployments may mean confidence, or it may mean repeated fixes. Story points may reflect planning discipline, or they may reflect negotiation.

Meanwhile, the expensive problems hide in the negative space. Engineers wait for environments. Builds take too long. Tests fail for unrelated reasons. Reviews arrive after context has faded. Requirements churn after implementation begins. Ownership is unclear. Debugging requires too many tabs and too much oral history. Every change is possible, but every change feels heavier than it should.

The trap

The trap is turning productivity measurement into performance surveillance.

Once metrics are attached to individual output, engineers adapt. They split work into smaller visible units, avoid risky cleanups, prefer changes with obvious credit, and underreport the coordination work that keeps systems healthy. The organization may get cleaner charts and worse engineering behavior.

Another trap is treating subjective friction as unserious. Engineers saying "the build is painful" or "reviews are slow" can sound anecdotal compared with a numeric dashboard. But subjective friction is often an early warning that the system of work is taxing cognition, attention, and confidence. The right response is not to accept every complaint as fact. The right response is to turn friction into inspectable evidence.

The deepest trap is measuring the last step of delivery while ignoring the path. Shipping is the visible event. Productivity is shaped by everything that happens before shipping: finding context, validating assumptions, changing code, proving safety, getting review, deploying, observing, and recovering.

The model

I start with six friction dimensions: wait time, review latency, build time, deploy confidence, rework rate, and cognitive load.

Wait time: where does work sit idle? Waiting for access, environments, decisions, reviews, approvals, dependency changes, test results, or production windows all count. Wait time is often organizational debt disguised as calendar time.

Review latency: how long does it take to get useful feedback, and how often does review arrive after context has gone cold? The important measure is not only first response. It is the time to decision-quality feedback.

Build time: how long does it take to know whether a change is technically viable? Slow builds, flaky tests, overloaded pipelines, and local setup pain create a tax on every learning loop.

Deploy confidence: how safe does it feel to release, roll back, observe, and repair? Low deploy confidence creates batching, batching creates risk, and risk creates more process. The metric should include rollback clarity, alert quality, ownership, and post-deploy visibility.

Rework rate: how often does completed work reopen because assumptions were wrong, contracts were unclear, requirements changed, or review found late architectural disagreement? Rework is not always waste, but unexplained rework is a signal.

Cognitive load: how much context must an engineer hold to make a safe change? Too many systems, inconsistent patterns, unclear ownership, and hidden invariants make ordinary work expensive. Cognitive load is harder to count, but it can be sampled through surveys, onboarding time, incident diagnosis, and repeated questions.

This model changes the first conversation. Instead of asking "how do we make engineers ship more," it asks "which constraints consume engineering attention before shipping happens."

Where this model breaks

Not all friction is waste.

Some review latency is the cost of rigor. Security-sensitive changes, data migrations, public contracts, and cross-boundary architecture need slower thinking than a local refactor. Some rework is discovery. Some cognitive load reflects a genuinely complex domain. Some waiting prevents unsafe launches. A low-friction process that lets bad changes move quickly is not productivity.

The model also breaks if every pain point becomes a platform project. Engineering organizations can spend too much time polishing internal experience while neglecting product outcomes. Friction work needs prioritization by impact, frequency, and risk. A painful workflow used once a quarter may matter less than a mildly annoying workflow hit hundreds of times a week.

There is also a measurement caveat. Friction metrics can be gamed if they become targets without interpretation. Reducing review latency by weakening review is easy. Reducing rework by discouraging feedback is easy. Reducing cognitive load by hiding complexity behind leaky abstractions is easy. The point is not lower numbers. The point is better flow with maintained safety.

What I do now

I begin productivity reviews with friction mapping. For a representative change, trace the path from idea to production learning. Where did it wait? Where did context get lost? Where did the engineer need private knowledge? Where did automation fail to answer a question? Where did review happen too late?

Then I separate local friction from systemic friction. Local friction belongs to a project or codebase. Systemic friction repeats across teams: slow pipelines, inconsistent release patterns, unclear ownership, overloaded reviewers, brittle environments, or review standards that live only in senior engineers' heads.

I pair metrics with qualitative evidence. A dashboard might show review time. Interviews explain whether the delay came from reviewer overload, unclear design, too-large changes, missing tests, or ownership confusion. A survey might flag build pain. Pipeline data explains whether the pain is compile time, flaky tests, queueing, or setup drift.

I also ask whether reducing friction will change behavior. Faster builds should encourage smaller changes. Better deploy confidence should reduce batching. Clearer ownership should shorten decision loops. Good defaults should reduce repeated review comments. If a metric improves but behavior does not, the metric may be ornamental.

Most importantly, I avoid individual scorekeeping. Productivity is largely a property of the system of work. Principal engineers should use metrics to expose constraints, not to rank people performing inside those constraints.

Closing takeaway

Measure friction before output: the best productivity work removes wait, rework, build pain, deploy fear, and unnecessary cognitive load while preserving the rigor that keeps systems safe.