Why AI Agents Need Smaller Interfaces Than Humans
Humans can operate across messy interfaces because we carry memory, social context, and fear of consequences. Agents have neither memory you should fully trust nor consequences they can personally own.
That is why agent interfaces should usually be smaller than human interfaces.
The thesis
The safest and most useful agent systems are built around narrow tools, explicit contracts, and constrained write scopes.
The point is not to make agents weak. The point is to make their work legible, reversible, and reviewable.
The production pattern
A human engineer can be handed a broad task like "clean up this service" and ask follow-up questions, notice sensitive files, infer deployment risk, and stop when the work starts changing shape.
An agent can simulate parts of that behavior, but the failure mode is different. It may continue confidently across a boundary that a human would treat as political, operational, or architectural.
Broad interfaces invite broad mistakes:
- A tool can read secrets when it only needed schemas.
- A command can mutate state when it only needed diagnostics.
- A refactor can touch unrelated files because they were nearby.
- A generated fix can change product semantics while solving a local test.
Smaller interfaces reduce the number of things the reviewer has to notice after the fact.
The model
I think about agent interface design with four controls.
Capability controls define what the agent can do. Reading files, editing files, running tests, opening network connections, creating tickets, and deploying code are different risk classes. They should not be bundled casually.
Scope controls define where the capability applies. A write tool that can edit one package is very different from a write tool that can edit the repository.
Contract controls define the shape of valid work. This includes required inputs, expected outputs, idempotency, error behavior, and whether the tool returns enough evidence for review.
Audit controls define what a human can reconstruct later. The system should expose the prompt, tool calls, changed files, command output, and final assumptions in a form that supports review.
The design checklist:
- Can this tool do one job with a clearly named outcome?
- Does it expose the minimum data needed for that job?
- Can its writes be previewed before commit?
- Are errors explicit instead of silently patched over?
- Is every side effect logged in human-readable form?
- Can a reviewer tell whether the agent stayed inside scope?
Where this goes wrong
There is a real counterpoint: overly narrow tools can make agents useless. If every useful task requires twenty approvals and ten brittle wrappers, the human spends more time operating the system than doing engineering.
The answer is not maximum restriction. It is risk-shaped restriction.
Read-only exploration can be broad. Writes should be narrower. Production side effects should be narrower still. Anything involving credentials, user-visible state, billing, access control, or data deletion deserves a separate bar.
What I do now
I avoid giving agents tools named after human workflows, such as "fix issue" or "update system." I prefer tools named after constrained operations: "search repository," "read file," "apply patch," "run test command," "summarize diff."
I also like explicit handoff points. An agent can propose a patch, but the merge step is a different interface. An agent can inspect logs, but changing alert thresholds is a different interface. An agent can draft a migration, but applying it is a different interface.
This keeps responsibility visible. A human can delegate work without delegating accountability into an opaque loop.
The interface budget
One practical way to make this concrete is to treat agent capability as a budget, not a binary permission.
Every interface spends budget along four dimensions: what the agent can observe, what it can change, how far the change can propagate, and how quickly a human can reverse it. A broad read over source code may spend little budget if secrets and production data are excluded. A tiny write against an access policy may spend a lot because the consequence is immediate and hard to notice.
This matters when teams try to scale agent use through shared tools. The tempting move is to create one powerful workflow that can diagnose, edit, test, open a pull request, and update a ticket. That may be convenient, but it collapses different risk classes into one interface. A safer design exposes the same workflow as separable capabilities with clear handoff points.
I like granting capability in stages. Start with search and read. Add patch proposal. Add test execution. Add narrow writes only after the tool produces useful audit records. Add production side effects only when rollback, approval, and ownership are explicit.
The budget can grow, but it should grow because the evidence got better, not because the demo was impressive.
Closing takeaway
Give agents small interfaces with clear contracts, then increase capability only where review evidence and rollback are strong.