Human Approval Is a Protocol

Putting a human in the loop is not automatically a safety mechanism. A vague approval can be worse than no approval because it creates the appearance of control while leaving the actual authority undefined.

The thesis

Human approval is a protocol, not a chat message. It must bind a person, proposal, scope, time window, and verification expectation.

"Looks good" is useful feedback. It is not a production authorization by itself. "Yes, send it" might approve a draft email, or it might approve the idea of sending an email after edits. "Go ahead and fix the issue" might approve a code patch, a command, a deployment, or only further investigation. If the system cannot tell which one, the approval cannot safely govern a write.

Agents make this problem sharper because they re-plan. A human approves one proposed action. The agent observes a tool result, changes the plan, and continues as if the approval still applies. That is approval drift.

The production pattern

An agent proposes to update a feature flag in staging. The human says yes. The tool call fails because the staging flag name has changed. The agent searches, finds a similarly named production flag, and asks itself whether this is equivalent. The conversation still contains a yes, but the authorized action has changed.

Another case: an agent drafts a message to an external partner. The human says, "Yes, send after adding the deployment time." The agent edits the message, adds a guessed time from an old calendar entry, and sends it. Did the human approve the final body? Did the approval include external delivery? Did it expire if the deployment time changed?

These failures are not about bad intentions. They happen because approval was treated as a natural language sentiment instead of a state transition.

The model

A useful approval protocol has six pieces: proposal, reviewer, decision, scope, expiry, and postcondition.

The proposal is the exact action under review. For a file write, it is the diff, target path, branch, base version, and expected result. For an email, it is recipients, subject, body, attachments, sending identity, and delivery time. For an operations action, it is command, target environment, allowed parameters, and expected state change.

The reviewer is the person or policy principal allowed to approve. Not every user can approve every action. A product owner may approve copy. An on-call engineer may approve a restart. A security reviewer may approve access to sensitive data. The protocol should know which role is required.

The decision is approve, deny, request changes, or defer. Request changes is especially important because it prevents "almost approved" from becoming approved after the agent edits the proposal.

The scope defines what the approval covers. It should include resource, action type, maximum effect, and any exclusions. "Approve this one comment" is different from "approve all ticket updates in this run."

The expiry prevents stale approval. A grant may expire after time, after a resource version changes, after the agent observes new conflicting context, or after one execution attempt.

The postcondition says what must be verified after execution. If the approved action is to open a pull request, the system should fetch the pull request and record its id. If the action is to change a setting, it should read the setting back.

Where this goes wrong

The first failure is approving summaries instead of proposals. "This patch fixes the issue" is not enough. The reviewer needs the actual patch or a structured representation of the effect. Summaries are useful, but they are not the thing being authorized.

The second failure is approval reuse. A human approves a command. The command fails. The agent changes flags and reruns it. That might be reasonable, but it is a different action unless the approval explicitly allowed that retry pattern.

The third failure is hidden authority in the reviewer identity. If any user in the chat can approve a production deploy, the approval protocol is only theater. The system needs to check that the approver has authority for the action.

The fourth failure is unclear denial. If a human says "not yet," the agent should record a denial or deferral, not keep asking in different words until it gets a yes. Repeated approval attempts can become social retry loops.

The counterpoint is that approval protocols can become exhausting. If every small write requires a detailed form, users will rubber-stamp or avoid the agent. The answer is to tier approvals by effect. Low-risk, reversible, in-scope writes can use lightweight approval. High-impact writes need precision.

What I do now

I make the approval request show the exact effect. For code, show the diff. For messages, show the final body. For ticket changes, show field updates. For commands, show command, target, working directory, and expected writes.

I bind the approval to a proposal hash and resource version. If the proposal changes or the resource changes, the approval no longer applies. This is one of the simplest ways to prevent approval drift.

I make denial and expiry first-class. A denied action should not be retried with cosmetic wording. An expired approval should not be revived from transcript memory. The agent can ask for a new approval, but it should be clear that this is a new decision.

I record approvals outside the model. The transcript can include the conversation, but the enforcement layer needs a structured grant: who approved, what they approved, when, under what conditions, and what happened afterward.

Closing takeaway

A human in the loop is only useful if the loop has a protocol. Approval must attach to a specific action, not to the agent's general desire to make progress.