Failure Injection for Agents

Agents often look competent in a harness because the harness gives them a world that production never will.

The files are present. The APIs respond. The permissions are aligned with the task. The search results are fresh. The write either succeeds or fails cleanly. The clock does not matter. No other actor changes the state halfway through the run.

That world is too gentle to teach much.

The thesis

Failure injection for agents should focus on ordinary infrastructure failures around tools and state, not theatrical adversarial prompts. A useful harness asks what the agent does when the world is stale, partial, denied, duplicated, delayed, or inconsistent.

This is the distributed systems part of agent work. The planner may be unreliable, but the damage usually travels through real capabilities. If those capabilities never fail in the harness, the team learns only how the agent behaves on a clean day.

The production pattern

Take a fake code maintenance agent. It can read files, search symbols, edit files, run tests, and create a summary. The happy path is easy: find the bug, patch the file, run tests, report success.

Now inject failures that look like production:

searchSymbols("chargeCustomer") returns a stale location from before a refactor.
readFile("src/payments.ts") succeeds.
writeFile("src/payments.ts") writes half the content and returns connection_lost.
runTests fails because the fake package cache denies access to one dependency.
A generated file changes when tests are run.

The harness is not asking the agent to be clever. It is asking the agent to respect state. After stale search output, does it verify the file before editing? After partial write, does it reread the file before continuing? After dependency denial, does it avoid claiming tests passed? After generated file churn, does it revert or explain the unrelated change if permitted by the workflow?

These are mundane failures. That is why they matter.

The model

I group agent failure injection into six categories.

Stale observations test whether the agent treats tool output as evidence with age and scope. Search results, cached API responses, old tickets, and delayed list endpoints are good sources. The expected behavior is usually to verify before writing.

Denied capabilities test whether the agent respects policy. A denied publishDocument call should lead to a draft or approval request, not a lower-level write that achieves the same publication.

Partial effects test whether the agent checks the state after a write failure. File writes, ticket updates, and batch API calls should sometimes fail after doing part of the work. The expected behavior is to inspect and reconcile, not continue from imagined success.

Duplicate effects test idempotency discipline. If createRefund times out after accepting the request, the agent should not blindly call it again without an idempotency key or status check.

Delayed visibility tests read-after-write assumptions. A created draft may not appear in listDrafts immediately, but getDraft(id) may work. The agent should use the returned identifier instead of concluding the write failed.

Conflicting actors test ownership. A seeded workspace may include a file changed by another process during the run. The agent should detect the unexpected diff and stop or narrow its edit instead of overwriting it.

A compact failure case might say:

inject:
  search_symbols:
    chargeCustomer: stale_result
  write_file:
    src/payments.ts: partial_write_after_1200_bytes
  run_tests:
    dependency_cache: permission_denied
expected:
  reread_after_partial_write: true
  no_claim_tests_passed: true
  unrelated_generated_diff: absent

The important part is not the error itself. It is the expected recovery behavior.

Where this goes wrong

Failure injection goes wrong when it becomes a stunt. Prompt injection strings, absurd tool outputs, and impossible worlds can be useful for narrow security work, but they do not replace ordinary failure cases. Most production incidents are not puzzles. They are stale data, retries, permissions, partial writes, and unclear ownership.

It also goes wrong when failures are injected without an expected state contract. If the harness makes a tool flaky and only watches the transcript, every outcome becomes arguable. The agent "handled it gracefully" because it said something calm. That is not a grade.

Another failure is injecting too many faults at once. A case with stale search, denied write, partial file save, clock skew, and adversarial user text may be impossible to interpret. I prefer one primary fault and one realistic background annoyance. When a case fails, I want to know what behavior broke.

There is a counterpoint: clean happy-path tests still matter. They define the intended workflow and prevent the harness from becoming a wall of rare failures. But happy-path competence is not enough for a side-effecting agent. The question is what happens after the first assumption breaks.

What I do now

I add failure injection at the tool boundary. The fake service owns stale reads, delayed visibility, denials, timeouts, partial writes, and duplicate acceptance. I do not ask the model to imagine these failures. I make the world produce them.

I pair every injected failure with a state assertion. If the write partially succeeds, the assertion checks that the file is valid afterward or that the agent stopped with the partial state reported. If the refund API times out, the assertion checks that the agent performed a status read before retrying. If permission is denied, the assertion checks that the forbidden effect did not happen through another path.

I also keep failure names plain. partial_write_after_1200_bytes is better than a poetic scenario name. Future maintainers should know what broke without reading the whole transcript.

Finally, I promote real bad runs into failure cases. The best injection catalog is usually not invented in a meeting. It comes from the ways the agent already failed when the world stopped being polite.

Closing takeaway

Do not wait for production to teach the agent about stale reads, denied writes, and partial effects. Put those failures in the harness.