Back to archive

Engineering

Prompting Is Not the Hard Part of Agentic Development

Why context design, verification, decomposition, and ownership matter more than clever prompts.

Prompting Is Not the Hard Part of Agentic Development

Prompting matters, but it is not the hard part of agentic development. A clever prompt cannot rescue a task with unclear ownership, missing context, no verification path, and an ambiguous definition of done.

The hard part is designing the work so a fast, literal, confident collaborator can help without quietly changing the system's meaning.

The thesis

Agentic development is mostly context design, decomposition, verification, and ownership.

Prompting is the visible interface. The operating discipline around the prompt is the system.

The production pattern

When agentic work disappoints, the postmortem often blames the prompt. It was not specific enough. It did not mention a file. It did not forbid a pattern. Sometimes that is true.

But the deeper issue is usually that the task was not engineered.

The agent was asked to solve a problem before the team had separated diagnosis from implementation. It was given a repository when it needed a module. It was asked to produce code when the missing artifact was a test, a decision record, or a migration plan. It was allowed to declare success without running the checks that matter.

Better wording helps at the margin. Better task architecture changes the outcome.

The model

I use a four-layer model for agentic development.

Context design is deciding what the agent needs to know and what it should ignore. Too little context causes shallow work. Too much context causes drift. The useful context includes the target behavior, relevant files, local conventions, constraints, and examples of acceptable output.

Decomposition is deciding the size and order of work. Agents do better with tasks that have visible boundaries: inspect, plan, write tests, implement, verify, summarize. Combining those phases can work for small changes, but it becomes risky as ambiguity grows.

Verification is deciding what counts as evidence. A passing unit test may be enough for a pure function. It is not enough for a migration, permission change, distributed workflow, or user-visible semantic change.

Ownership is deciding who absorbs the consequences. If nobody can explain the patch without referring to the model, the work is not owned.

Before I start, I ask:

  • What decision has already been made, and what decision is still open?
  • What should the agent read before writing?
  • What files or behaviors are out of scope?
  • What should fail if the implementation is wrong?
  • What human review is required before the change becomes real?

Where this goes wrong

One failure mode is prompt theater. Teams spend energy creating elaborate prompt templates while leaving test gaps, unclear contracts, and sprawling tasks untouched.

The counterpoint is that prompt quality still matters. A vague prompt wastes cycles and can produce dangerous confidence. For repeated workflows, a good prompt template can encode useful discipline.

But a prompt template should be the last mile of an operating model, not a substitute for one.

What I do now

I write prompts as if I am writing a work order for a strong engineer who has no background context and no right to guess.

I include the target behavior, constraints, scope, verification commands, and output expectations. I ask the agent to state assumptions before large edits. I split exploration from mutation when the system is unfamiliar. I treat the final summary as a review artifact, not as proof.

Most importantly, I decide what kind of output I want. Sometimes the right output is code. Sometimes it is a failing test. Sometimes it is a risk list. Sometimes it is a plan I will hand to another human.

The task design checklist

The checklist I use before a serious agent session is deliberately boring.

I start with the unit of change. Is this a behavior change, a mechanical refactor, an investigation, a test gap, or an operating decision? Mixing those into one prompt makes the agent look productive while hiding the real state of the work.

Then I name the evidence. For a parser, evidence might be table-driven tests around malformed input. For an authorization change, it might be negative tests and a review of existing policy callers. For a migration, it might be compatibility checks, rollback notes, and a dry run on representative data. "Run tests" is not enough unless the tests are actually connected to the risk.

Then I name the stop conditions. The agent should stop when it discovers a larger design question, a conflicting convention, missing credentials, unclear ownership, or a verification command it cannot run. A good stop is not failure. It is the system refusing to turn uncertainty into code.

Finally, I decide how much autonomy the task deserves. Exploration can be open ended. Mutation should be bounded. Integration should be explicit.

This is why prompting alone is a weak lever. A prompt can phrase the request. Task design decides whether the request is worth executing.

Closing takeaway

Do not ask how to write a better prompt until you have designed the task, the context, the evidence, and the owner.