The Agent Action Lifecycle
Most agent failures do not happen because the model says one strange thing. They happen because a fuzzy intention becomes a real action without enough transitions in between. The agent thinks, speaks, calls a tool, sees output, changes its mind, and keeps moving. The production system sees only a blur.
The thesis
Agent actions need a lifecycle. Without one, planning, authorization, execution, and verification collapse into a single conversational moment.
That collapse is where side effects escape review. A model can say, "I will update the config," then call a file tool, then discover the config moved, then update a different file, then report success. Each step may be locally understandable. The overall action is not governed by the original intent anymore.
The fix is not to slow every agent to a crawl. The fix is to define the states an action must pass through before it changes shared reality.
The production pattern
Consider an agent asked to address a failing build. It reads the failure log, identifies a missing dependency, edits a package file, runs tests, opens a pull request, and comments on the ticket. That is a familiar developer workflow.
Now look at the action boundaries. Editing the package file is a write. Running tests may write caches, artifacts, or generated files. Opening a pull request writes to a remote repository. Commenting on the ticket writes to a collaboration system. If the test command fails, the agent might revise the edit and try again. If a human approves the first proposed change, the agent might still be on a later change by the time it executes.
Without a lifecycle, the only durable record is the transcript and the final artifact. That is not enough to answer production questions. What did the agent intend? What was approved? Which file versions did it observe? Which command failed? Which retry changed the proposed write? Why did it decide the verification was sufficient?
The model
I use seven action states: proposed, checked, authorized, reserved, executed, verified, and recorded.
Proposed means the action is explicit. It has an action id, actor, resource, parameters, expected effect, and reason. "Change config" is not proposed. "Set retry_count from 2 to 4 in worker.yaml on branch agent/fix-build, based on blob abc123" is proposed.
Checked means policy and preconditions were evaluated before execution. Is the actor allowed to write this file? Is the file still at the version the agent read? Is the branch protected? Does this action require human approval? Is the requested command inside the allowed prefix set?
Authorized means a policy decision or human approval is attached to that exact proposal. The authorization has scope and expiry. It does not float around the conversation as a reusable yes.
Reserved means the system has claimed the right to execute without conflicting with another actor. This can be as simple as a compare-and-swap on a file version, a lease on a ticket, or a lock around a deployment lane. Not every action needs a reservation, but shared resources often do.
Executed means the tool ran and returned evidence. The evidence is a commit sha, diff, job id, message id, ticket update id, or command result. It is not just a natural language summary.
Verified means a separate observation checked the intended state after execution. Read the file back. Fetch the pull request. Query the job status. Confirm the ticket comment exists. If verification fails, the action is not complete.
Recorded means the system stores enough history to replay the decision later: context references, policy results, approval records, tool inputs, tool outputs, and verification results.
Where this goes wrong
The first mistake is skipping proposal. If the agent can call a write tool directly from a private chain of reasoning, reviewers see the result but not the decision point. That makes it hard to deny an action before it happens.
The second mistake is treating authorization as a mood. A human says, "Yes, go ahead," after reading a summary. The agent then changes the patch after a test failure and still treats the approval as valid. In a lifecycle, the changed patch creates a new proposal. It may be similar, but it is not the same authorized action.
The third mistake is retrying execution without idempotency. If create_ticket times out, did it fail or did the ticket get created? A blind retry can create duplicates. If send_message returns a network error, the message may still have been delivered. The lifecycle needs action ids and reconciliation hooks so ambiguous outcomes do not become repeated side effects.
The counterpoint is that internal scratch work should not be over-modeled. An agent can revise notes, search docs, and run local read-only analysis without making every thought a state transition. The lifecycle becomes mandatory when the action affects shared state, external systems, or future automation.
What I do now
I require proposed writes to be explicit objects before execution. They can be stored in a database, a run log, or even a structured file for smaller systems. The storage medium matters less than the separation between "the agent thinks this should happen" and "the system allowed this to happen."
I bind approvals to proposal hashes. If the target, parameters, diff, command, environment, or expected effect changes, the old approval no longer applies. This prevents approval drift during iterative planning.
I make verification boring and mechanical. Each effect tool returns a handle, and each handle has a read path. The agent may interpret the result, but the system captures the observation. If a verification check fails, the action enters reconciliation rather than being papered over by a confident sentence.
I also review lifecycle logs during incidents. They reveal whether the agent made a bad decision, used stale context, crossed a policy boundary, or executed correctly against a bad human approval. Those are different fixes.
Closing takeaway
An agent action is not a tool call. It is a governed transition from proposed intent to verified state.