Agents Are Unreliable Planners Attached to Real Tools

An agent can be wrong in a very ordinary way: it can misunderstand the goal, carry stale context, skip a precondition, or keep following a plan after the world has changed. The unusual part is not the mistake. The unusual part is that the planner is connected to tools that can write files, send messages, create tickets, rotate keys, merge pull requests, and change production state.

The thesis

The model is not the system of record for intent. It is a fallible planning component attached to capabilities with real side effects.

That distinction changes the engineering problem. If the agent only talks, a bad plan is a bad answer. If the agent can act, a bad plan is a proposed write, and sometimes an executed write. Production agent systems need the same suspicion we already apply to distributed systems: state can be stale, ownership can be ambiguous, retries can duplicate work, and success messages can be false.

The design target is not a smarter prompt that never proposes the wrong action. The design target is a system where a wrong plan is visible, bounded, denied when necessary, and recoverable when it gets partly executed.

The production pattern

A common workflow starts safely. The user asks an agent to "clean up the deployment issue." The agent reads a ticket, checks a log, searches the repo, and forms a plan. It decides the fix is to update an environment variable and restart a service. So far, this is analysis.

Then the plan touches tools.

It proposes a write to deploy/prod.yaml. It asks for permission to run a command. It opens a pull request. It comments on the incident channel. It may even trigger a deployment pipeline if the tool surface allows it. Each step has a different blast radius, but the conversational interface can make them look like the same kind of action: "I will fix it."

The planner can be unreliable at each boundary. It may have read a stale incident comment. It may have confused staging with production because both appeared in context. It may propose a write that was correct before another engineer landed a fix. It may pass an approval check for "restart worker" and later use that approval while restarting a different worker after re-planning.

These are not exotic failures. They are normal distributed system failures wearing a conversational interface.

The model

I use a simple model: every agent action is a transition from uncertain intent to real effect.

The first object is the request: what the user asked for, who asked, and what resources are in scope. The second object is the plan: what the agent currently believes should happen. The third object is the proposed action: the exact tool call, resource, parameters, and expected state change. The fourth object is the authorization: the policy or human grant that permits that proposed action. The fifth object is execution evidence: what actually happened. The final object is verification: how the system proved the intended state exists.

The important part is that these are separate objects. A plan is not an authorization. An authorization is not execution. Execution is not verification. A sentence from the model is not durable state.

For example, "update the timeout" is not enough. A production proposal should say: write PAYMENT_CAPTURE_DEADLINE_MS=45000 to service checkout-worker in staging, based on config version abc123, before 2026-05-22T10:30:00Z, with no production deploy permission attached. That proposal can be reviewed. It can expire. It can be compared with current state before execution. It can be denied if the file changed.

The same model applies to smaller actions. "Post a comment" should include the destination, body, thread, identity, and reason. "Create a ticket" should include the project, labels, title, body, and whether duplicates were checked. "Run a shell command" should include the working directory, allowed command prefix, timeout, expected writes, and output handling.

Where this goes wrong

The most common mistake is treating the agent plan as if it were a transaction. The model says it will do five things, so the system lets it perform five tool calls. But the plan was only a momentary belief. After the first call, the world changed. After the second call, the agent observed new output. By the third call, it may be following a different plan while still borrowing trust from the first approval.

Another failure is hiding side effects behind friendly tool names. A function called fix_pr_feedback might push commits, resolve review threads, rerun CI, and post comments. From the model's point of view, that looks like one tool. From the production system's point of view, it is a bundle of writes across multiple control planes. If the function is too broad, there is no useful place to apply policy.

Verification is also often replaced with narration. The agent says, "I updated the config and the service is healthy." That is not evidence. Evidence is the actual diff, the accepted commit, the deployment status, the health check result, and the fact that those observations were captured after the write.

The counterpoint is that not every agent needs a heavy workflow engine. A local drafting assistant that writes to a scratch file does not need the same machinery as an agent that can change billing policy. But once a tool can affect shared state, customer-visible behavior, money, credentials, or production operations, the action needs a lifecycle outside the model.

What I do now

I start by classifying tools by effect. Read tools can be broad, though still audited when they touch sensitive context. Write tools are narrow and explicit. Any tool that can affect production, external users, security posture, money, or irreversible state gets a proposal object before execution.

I make stale context a first-class reason to refuse a write. If the agent read a file version, ticket timestamp, policy revision, or deployment state, the proposed write carries that precondition. If the precondition no longer holds, the system stops and asks for a refreshed plan.

I keep human approval tied to the proposal, not the conversation. "Yes, do it" approves a specific action with specific parameters for a limited time. It does not approve whatever the agent decides next after another tool call changes its plan.

I require verification to be independent of the model's claim. If the action creates a pull request, fetch the pull request. If it updates a file, read the file back. If it sends a message, record the message id. If verification fails, the next step is not another improvised write. The next step is reconciliation.

Closing takeaway

Treat the model as a planner, not an owner. The owner is the system that records intent, gates capabilities, executes writes, verifies state, and knows when to stop.