Back to archive

Engineering

Agentic Coding Changes the Bottleneck, Not the Responsibility

Why AI-assisted development moves senior responsibility toward framing, review, and integration.

Agentic Coding Changes the Bottleneck, Not the Responsibility

The first time a coding agent feels fast, the wrong conclusion is tempting: implementation has become cheap, so engineering judgment matters less.

I think the opposite is true. Agentic coding makes weak engineering judgment more expensive because the system can now produce more plausible work before anyone notices the framing was wrong.

The thesis

AI-assisted development moves the bottleneck from typing code to defining, constraining, reviewing, and integrating change.

That sounds like a productivity story, but it is really an accountability story. The person responsible for the system is still responsible for the behavior, even when the first draft came from a machine.

The production pattern

A pattern I keep seeing is that agents are strongest when the problem boundary is crisp and weakest when the work requires latent product judgment.

They can follow a local convention, update repetitive call sites, generate tests around a known contract, or explore an unfamiliar codebase quickly. They struggle when the important question is not "what code satisfies this prompt?" but "what behavior should this system have after the change?"

That difference matters because production software is mostly boundary work:

  • What invariants must survive?
  • Which users or operators rely on today's odd behavior?
  • What data shape is allowed to pass through this layer?
  • What happens during partial deploy, retry, rollback, and migration?
  • Who owns the new failure mode after the diff lands?

The agent can accelerate many of the moves inside that boundary. It cannot be accountable for choosing the boundary.

The model

I use a four-part frame before giving an agent meaningful write access.

First, define the job. A good task is not "fix this module." It is "make this parser reject invalid timestamps while preserving the existing accepted format and adding regression coverage for empty, malformed, and timezone-bearing inputs."

Second, define the blast radius. The agent should know which files are in scope, which public contracts cannot change, and which adjacent areas are read-only unless explicitly justified.

Third, define the evidence. The output is not done when the code compiles. It is done when the tests, static checks, manual review notes, and any migration reasoning make the change credible.

Fourth, define the integration path. The larger the change, the more important it is to decide whether the agent should produce a patch, a plan, a focused test, a mechanical refactor, or a throwaway exploration.

The checklist I use is:

  • Intent: Can I state the behavior change in one sentence?
  • Contract: Which public interfaces, data formats, or operational assumptions are touched?
  • Scope: What files may change, and which files are context only?
  • Verification: What would prove the change works and did not drift?
  • Rollback: If this is wrong, how do we recover?
  • Ownership: Who will understand and maintain the result after the agent is gone?

If I cannot answer those questions, the agent is not blocked by lack of prompt quality. It is blocked by lack of engineering clarity.

Where this goes wrong

The common failure is treating agentic coding as a throughput multiplier without adding a review multiplier.

That creates a queue of work that looks finished. The code has comments, tests, and confident structure. But the hidden risk is in the premise: a missed invariant, a subtly changed API shape, a test that asserts the implementation instead of the requirement, or a migration that works only from a clean state.

There is a counterpoint. For small, local, reversible work, heavy framing can be waste. If the task is a documentation typo, a generated type update, or a narrow mechanical rename, the right move is to keep the ceremony small. A principal engineer should not turn every agent interaction into an architecture review.

The judgment is in matching control to risk. Low-risk changes need speed. High-risk changes need boundaries.

What I do now

I split agentic work into four modes.

Exploration mode is read-heavy. The agent maps files, explains call paths, and identifies likely change points. I do not treat its conclusions as truth until I check the evidence.

Draft mode produces a candidate patch. The useful output is not just the diff. It is the reasoning trail: assumptions, changed contracts, and tests.

Mechanical mode is for repetitive edits where the contract is already settled. The main review is coverage of all call sites and absence of unrelated edits.

Verification mode asks the agent to attack the change: find missing cases, run checks, compare behavior, and describe residual risk.

This mode separation prevents the worst habit: asking an agent to discover the problem, choose the design, implement the fix, and declare success in one uninterrupted loop.

Closing takeaway

Use agents to increase implementation bandwidth, but spend the saved time on sharper framing, narrower scope, and more serious review.