AI Agent Planning and Reasoning: ReAct Loops, Task Decomposition, and Stop Conditions

Choose the Smallest Planning Pattern That Works

Planning helps an agent transform a broad goal into actions that can be executed, observed, and verified. It is useful when tasks have dependencies, uncertain information, or several possible routes.

Not every task needs a long plan. Many successful agents use a short next-action loop: inspect state, select one action, observe the result, and continue. Larger plans are valuable when coordination or human review requires visibility before work begins.

Reasoning must be bounded by budgets and evidence. A plan is a proposal, not permission to execute tools, and the runtime must decide when the result is sufficient or the run should stop.

A next-action loop works well for interactive search, troubleshooting, and tool use where each result changes the next decision. A plan-and-execute pattern works better when a task has known dependencies or several workers must coordinate.

For predictable tasks, skip model planning and use a deterministic workflow. Agentic planning adds value only when the route genuinely depends on interpretation or newly discovered information.

Direct response: no tools or plan are needed.
Next-action loop: choose one action after each observation.
Plan and execute: create milestones, execute them, then re-plan if needed.
Router and workers: classify work and send it to specialized handlers.

Make Plans Executable and Observable

A useful plan contains concrete steps with completion conditions. Vague steps such as "research the issue" are hard to evaluate. Better steps name the source to inspect, the information to extract, and the condition that marks the step complete.

Store compact plan state: pending steps, current step, completed evidence, blockers, and revision count. Avoid storing unlimited internal reasoning text.

Replan Only When Evidence Requires It

Tool errors, missing data, contradictory evidence, or changed user requirements may invalidate a plan. Replanning should respond to one of those signals, not happen after every successful step.

Set a revision limit. Repeated replanning often indicates an unclear goal, inadequate tools, or a task that needs human clarification.

Retry transient failures with a bounded policy.
Choose an alternate tool when the first source is unavailable.
Ask the user when a required business decision is ambiguous.
Stop and summarize partial progress when the budget is exhausted.

Define Completion Before the Loop Starts

Agents frequently overwork because "done" is not defined. Completion might mean every requested field is present, evidence meets a confidence threshold, a test suite passes, or a reviewer approves the proposed action.

Use several stop conditions together: success criteria, maximum steps, time limit, cost budget, repeated-action detection, and cancellation.

Verify Results Instead of Trusting Confidence

A model saying it is finished is not proof. Verify with deterministic checks when possible: schema validation, database constraints, test execution, citation coverage, calculation checks, or comparison against expected records.

Use model-based judging only where deterministic checks cannot capture quality, and calibrate those judges against human-reviewed examples.

Planning Is a Runtime Strategy

Planning is not a guarantee that the model is reasoning correctly. It is a runtime strategy for decomposing work, choosing actions, and deciding when enough progress has been made. Some tasks need no plan. Some need a short checklist. Some need iterative ReAct-style tool use. Some need a planner-executor-reviewer pattern.

Choose planning depth based on uncertainty and risk. A simple classification should not spend tokens building a multi-step plan. A research task may need an explicit plan because the agent must search, compare, cite, and revise. A write-capable task may need a plan plus approval because the consequence is higher.

Plans should be inspectable and updateable. If the agent discovers missing evidence, a failed tool, or a policy restriction, it should revise the plan rather than continue blindly. The runtime should store the current plan, completed steps, open questions, and stop reason.

The application should validate plans before risky execution. A model-generated plan that includes "email the customer" or "delete duplicate records" should pass through policy and approval checks before any tool call happens.

Use minimal planning for low-uncertainty tasks.
Use explicit plans for research, multi-tool, or high-risk tasks.
Store plan state so progress is visible and resumable.
Revise plans when observations contradict assumptions.
Validate risky plan steps with deterministic policy.

Reasoning Failure Modes and Stop Conditions

Agent reasoning fails in recognizable patterns. The model may over-plan, repeat the same tool, chase irrelevant evidence, invent a missing observation, ignore a failed tool, or keep working after the answer is already good enough. These are runtime problems as much as model problems.

Stop conditions turn vague autonomy into controlled autonomy. Define success checks, maximum iterations, repeated-action detection, evidence thresholds, confidence thresholds, and escalation triggers. When the agent stops, it should explain whether it completed the task, needs user input, hit a budget, or found insufficient evidence.

For complex tasks, add a verification step. The verifier should inspect the answer against the goal, evidence, policy, and trace. Verification can be model-assisted, deterministic, or human-reviewed depending on risk. The important point is that the same component that generated the answer should not be the only judge of quality.

Planning quality should be evaluated from traces. Do not only score the final answer. Score whether the plan was appropriate, whether tool calls were necessary, whether the agent recovered from errors, and whether it stopped for the right reason.

Detect repeated actions and unproductive loops.
Require evidence before factual conclusions.
Escalate when policy, confidence, or missing data demands it.
Use separate verification for important outputs.
Track stop reasons as a quality metric.

Planning Evaluation Exercise

Evaluate planning by comparing the plan to the task, not by admiring how detailed it looks. A good plan is short enough to execute, specific enough to inspect, and flexible enough to change when observations arrive. A long plan that ignores evidence is worse than no plan.

For each test task, record the initial plan, tool calls, revised plan, stop reason, and final outcome. This reveals whether the agent is actually adapting or merely producing planning text before improvising. It also shows whether planning consumes more cost than it saves.

Include tasks where the best behavior is to ask a question, refuse, or stop early. Planning systems often fail by continuing to act when uncertainty should trigger clarification or escalation.

Score plan relevance, not plan length.
Track revisions after tool observations.
Measure unnecessary steps and repeated actions.
Reward correct clarification and early stopping.

Bounded Planning Loop

A planning loop alternates between deciding, acting, observing, and checking whether the goal is satisfied. Keep the plan in compact structured state rather than exposing or storing private internal reasoning. The system needs action rationale, evidence references, and stop signals, not an unrestricted hidden-thought transcript.

Decompose only when a task benefits from separate verification or tools. Excessive plans create stale steps, extra model calls, and more failure points. Re-plan after material observations, but cap turns, tool calls, elapsed time, spend, repeated errors, and no-progress cycles. Escalate when the limits are reached.

A stop condition should be externally testable: required fields are present, cited evidence supports the answer, the backend confirms the action, or a reviewer accepted the proposal. “The model says done” is not enough. Record the final stop reason so evaluations can distinguish success, refusal, timeout, budget exhaustion, and human escalation.

Store concise plans and observable rationale, not private reasoning traces.
Decompose only where separate action or verification adds value.
Detect repeated actions and no-progress loops.
End on application evidence or explicit escalation.

Planning Loop Examples

Bounded Next-Action Loop

The runtime executes one proposed action at a time and stops on success, repetition, or budget exhaustion.

Bounded Next-Action Loop

def choose_next_action(state: dict) -> dict:
    if not state["observations"]:
        return {"name": "search_order", "args": {"order_id": state["order_id"]}}
    return {"name": "finish", "args": {}}

def run_agent(order_id: str) -> dict:
    state = {"order_id": order_id, "observations": [], "actions": []}

    for _ in range(4):
        action = choose_next_action(state)
        signature = (action["name"], repr(action["args"]))

        if signature in state["actions"]:
            return {"status": "blocked", "reason": "repeated action", "state": state}

        state["actions"].append(signature)

        if action["name"] == "finish":
            return {"status": "completed", "state": state}

        if action["name"] == "search_order":
            state["observations"].append({"status": "shipped"})

    return {"status": "budget_exhausted", "state": state}

print(run_agent("ORD-1042")["status"])

The loop has a hard step limit.
Repeated actions are detected before another tool call.
The final status distinguishes success from controlled failure.

Plan with Explicit Completion Tests

Each step states what evidence marks it complete.

Plan with Explicit Completion Tests

plan = [
    {
        "step": "load_invoice",
        "complete_when": "invoice total and vendor id are present",
    },
    {
        "step": "load_purchase_order",
        "complete_when": "matching order is found",
    },
    {
        "step": "compare_amounts",
        "complete_when": "difference is calculated",
    },
    {
        "step": "request_review",
        "complete_when": "reviewer approves or rejects the mismatch",
    },
]

for item in plan:
    print(f"{item['step']}: {item['complete_when']}")

Completion criteria make progress measurable.
Human review is part of the workflow, not an afterthought.
A production executor would store status and evidence for each step.

Before you move on