Back to archive

Agentic Systems

Watcher Agents

Watcher agents are read-only harness components that notice unsafe drift and propose interventions without taking ownership of the work.

Watcher Agents

A watcher agent is useful only if it stays boring.

The tempting version is an assistant that notices a problem and fixes it. That is also the dangerous version. Once the watcher can write, approve, retry, or mutate state, it stops being a watcher and becomes another actor in the system. Now the harness has two planners, two sets of side effects, and a much harder incident report.

The production version is narrower: observe the run, compare it to the contract, and propose a reviewable intervention.

The thesis

A watcher agent should be read-only by design.

It can say: this task is drifting, this tool call does not match the approved intent, this lease is close to expiry, this output no longer matches the requested boundary, this run should be paused for review.

It cannot say: I will patch the file, rerun the deploy, approve the exception, close the ticket, or widen the permission.

That separation is not ceremony. It is the thing that keeps a monitoring component from becoming an untracked control path.

The production pattern

The recurring pattern is a long-running harness task with enough autonomy to get into trouble slowly.

An implementation agent is modifying code. A migration assistant is preparing a rollout plan. A data repair harness is walking records. A documentation agent is allowed to open pull requests. The individual tool calls may look reasonable, but the run as a whole can drift away from the original approval.

A watcher sits beside the run and reads:

  • the original user intent
  • the approved scope
  • the action lifecycle
  • tool inputs and outputs
  • leases, deadlines, and stop conditions
  • state changes proposed by the worker
  • prior warnings or retries

Consider a code-editing harness asked to update a metrics wrapper in one service. Halfway through, the worker discovers a shared configuration file and decides the wrapper should be renamed globally. The watcher sees the proposed diff crossing into a protected directory. It also sees that the approval was limited to one service. The watcher emits a finding: pause before write, scope expansion requires review, affected files listed below.

The watcher proposes. It does not write. It does not roll back. It does not open a second pull request to "help."

The model

I use five fields when designing a watcher.

Inputs: immutable run metadata, current harness state, tool call records, policy decisions already made, and selected snapshots of the target state. A watcher should not depend on conversational memory. It should read the same trace a human reviewer would read.

Permissions: read-only access to traces, plans, diffs, logs, queue state, and approval records. It may have permission to request a pause from the supervisor, but not to perform the pause itself unless that request goes through a deterministic gate.

Outputs: small findings with evidence. A good watcher output names the violated boundary, the observed event, the relevant approval or policy, and the proposed next step. "This looks risky" is not an output. "The worker requested write access to infra/prod after an approval scoped to services/billing; require human approval before continuing" is an output.

Failure modes: false positives that train operators to ignore it, false negatives caused by missing state, shared-context failures where the watcher inherits the worker's bad assumptions, and alert storms where one bad run produces twenty indistinguishable findings.

Review path: findings must land somewhere owned. That can be a harness pause reason, a pull request comment, an audit queue item, or an incident note. The watcher is not successful because it generated text. It is successful when the system knows who must decide.

Where this goes wrong

The first failure is giving the watcher a write path because "it already knows what is wrong." That creates a race between the worker and the watcher. The worker proposes a change, the watcher patches around it, and now neither trace explains the final state cleanly.

The second failure is watching tokens instead of decisions. Token streams can be interesting during research, but production review needs events: capability requested, capability granted, tool called, state observed, state changed, deadline crossed, stop condition ignored. A watcher that summarizes conversation is weaker than a watcher that follows the action lifecycle.

The third failure is making the watcher share all context with the worker. If the worker was misled by a poisoned document, the watcher may be misled too. A watcher needs provenance. It should know which instruction came from the user, which came from a retrieved document, which came from a tool response, and which came from the model's own plan.

The fourth failure is no owner. A watcher finding that nobody reads is worse than no watcher, because it creates a false sense of coverage.

What I do now

I start by choosing the decisions worth watching. I do not ask a watcher to judge everything.

For most harnesses, the useful decisions are:

  • scope expansion
  • permission escalation
  • repeated retries
  • ignored deadlines
  • destructive operations
  • writes outside the expected boundary
  • mismatch between approval and action
  • unexpected changes in queue age or run duration

Then I define the output contract before I write the watcher prompt. Every finding needs a type, evidence, severity, proposed action, and review destination. If the watcher cannot fill those fields, it should stay silent or emit an explicit "insufficient evidence" finding for later tuning.

I also prefer the watcher to be late and precise rather than early and noisy. A watcher that flags every unfamiliar action will be bypassed. A watcher that flags a write after the run has crossed an approved boundary earns attention.

Finally, I keep the enforcement outside the model. The watcher can recommend a pause. The supervisor checks whether the recommendation matches a configured stop rule. The reviewer decides whether to continue. That chain is slower than letting the watcher fix things, but it leaves a trace that can be audited.

Closing takeaway

A watcher agent is not a junior operator. It is a read-only circuit for noticing drift. If it can mutate the system it watches, it has joined the failure domain it was supposed to illuminate.