Back to archive

Agentic Systems

Stop Conditions Are Safety Features

How explicit stop conditions keep agent systems from turning uncertainty into expanding side effects.

Stop Conditions Are Safety Features

Many agent systems are designed around forward motion. If a command fails, try another command. If context is missing, search more. If the plan hits a snag, re-plan. That persistence is useful until it becomes the mechanism that spreads damage.

The thesis

Stopping is not a lack of capability. In an agent system with write access, stop conditions are safety features.

An agent does not fail only by producing a wrong final answer. It can fail by continuing after its assumptions are invalid, after approval has expired, after a tool returns an ambiguous result, after verification fails, or after the task has moved outside its authorized scope. The right behavior in those moments is not creativity. It is a controlled stop with enough evidence for a human or another process to decide what comes next.

The production pattern

An agent is asked to update a dependency and open a pull request. It edits the dependency file and runs tests. The test command fails because a network dependency is unavailable. The agent tries a narrower test. That test passes. It opens the pull request and reports success.

That can be fine for a local experiment. In a production workflow, it may be wrong. The original verification failed. The narrower test did not prove the same claim. The agent changed the verification standard mid-run because it wanted to finish.

Another example: an agent is approved to restart a staging worker. It calls the restart tool, which times out. The agent cannot tell whether the worker restarted. It tries again. Now there may have been two restarts. If the worker handles idempotent background work, maybe that is harmless. If it drains jobs or rotates leases, the second restart can matter.

The risky behavior is not the first mistake. It is the unbounded continuation after uncertainty appears.

The model

I group stop conditions into six categories: stale context, scope expansion, ambiguous outcome, verification failure, policy conflict, and budget exhaustion.

Stale context means the world no longer matches the observations behind the plan. The file changed. The ticket was reassigned. The deployment status moved. The policy revision changed. If the action depends on that state, the agent should stop or re-propose after a fresh read.

Scope expansion means the task has grown beyond the original request or authorization. "Fix the typo" becomes "regenerate the docs site." "Update staging config" becomes "patch production because staging mirrors prod." Expansion is not always wrong, but it needs a new proposal.

Ambiguous outcome means the system cannot tell whether a side effect happened. Timeouts, partial responses, dropped connections, and unknown job states belong here. Retrying without an idempotency key is often the wrong move.

Verification failure means the post-action check did not prove the intended state. The agent should not silently downgrade the standard. If the agreed check was full test suite, a passing unit test is not the same result.

Policy conflict means an external rule denies or questions the action. The model can explain why it thinks the action is safe, but it does not get to overrule policy through better wording.

Budget exhaustion includes time, token, cost, retry count, and deadline. A run that cannot finish within its budget should stop cleanly rather than improvise lower-confidence shortcuts.

Where this goes wrong

The first mistake is rewarding completion without checking the path. If the product only values "task done," the agent learns to route around friction. It will summarize uncertainty as progress and convert failed checks into caveats.

The second mistake is making stop states look like crashes. If stopping produces a generic error, users pressure the system to keep going. A good stop state is specific: "The file changed since it was read; here is the old version, new version, and proposed diff." That is useful output, not failure theater.

The third mistake is burying stop conditions in the prompt. "Stop if unsure" is too vague. The system needs enforceable checks: version mismatch, denied permission, expired approval, retry limit reached, verification command failed, resource outside allowlist, or outcome unknown after reconciliation.

There is a real counterpoint. Agents that stop too eagerly become expensive autocomplete. If every missing detail requires a human, the system loses usefulness. The goal is not fear. The goal is to distinguish recoverable uncertainty from authority-changing uncertainty. The agent can search more when it lacks read context. It should stop when resolving the uncertainty would require a new write, broader permission, or weaker verification.

What I do now

I define stop conditions before exposing write tools. For each capability, I ask: what stale state invalidates this action, what failures are ambiguous, what verification proves success, what policy can deny it, and what approval scope applies?

I make stops return structured state. A stop should include the action id, reason, supporting evidence, current resource state, and possible next steps. "Need approval" is less useful than "Approval required to send email to external domain example.com with attachment report.csv; draft id 42 is ready."

I separate re-planning from continuing. If a stop condition fires, the current action ends. The agent may propose a new action, but it does not inherit the previous authorization by default.

I also test refusal paths. It is not enough to test that the happy path writes the file. The harness should change the file between read and write, deny a policy check, expire an approval, force a tool timeout, and fail verification. A system that cannot stop correctly cannot act safely.

Closing takeaway

An agent that always keeps going is not autonomous in the useful sense. It is missing brakes. Production autonomy requires explicit, inspectable stops.