Tutorials Logic, IN info@tutorialslogic.com

AI Agent Planning and Reasoning: ReAct Loops, Task Decomposition, and Stop Conditions

AI Agent Planning and Reasoning

Planning helps an agent transform a broad goal into actions that can be executed, observed, and verified. It is useful when tasks have dependencies, uncertain information, or several possible routes.

Not every task needs a long plan. Many successful agents use a short next-action loop: inspect state, select one action, observe the result, and continue. Larger plans are valuable when coordination or human review requires visibility before work begins.

Reasoning must be bounded by budgets and evidence. A plan is a proposal, not permission to execute tools, and the runtime must decide when the result is sufficient or the run should stop.

Mental Model

Planning chooses a route; execution discovers the terrain. A reliable agent plans only as far as useful, observes every action, and revises the route when reality disagrees.

Choose the Smallest Planning Pattern That Works

A next-action loop works well for interactive search, troubleshooting, and tool use where each result changes the next decision. A plan-and-execute pattern works better when a task has known dependencies or several workers must coordinate.

For predictable tasks, skip model planning and use a deterministic workflow. Agentic planning adds value only when the route genuinely depends on interpretation or newly discovered information.

  • Direct response: no tools or plan are needed.
  • Next-action loop: choose one action after each observation.
  • Plan and execute: create milestones, execute them, then re-plan if needed.
  • Router and workers: classify work and send it to specialized handlers.

Make Plans Executable and Observable

A useful plan contains concrete steps with completion conditions. Vague steps such as "research the issue" are hard to evaluate. Better steps name the source to inspect, the information to extract, and the condition that marks the step complete.

Store compact plan state: pending steps, current step, completed evidence, blockers, and revision count. Avoid storing unlimited internal reasoning text.

Replan Only When Evidence Requires It

Tool errors, missing data, contradictory evidence, or changed user requirements may invalidate a plan. Replanning should respond to one of those signals, not happen after every successful step.

Set a revision limit. Repeated replanning often indicates an unclear goal, inadequate tools, or a task that needs human clarification.

  • Retry transient failures with a bounded policy.
  • Choose an alternate tool when the first source is unavailable.
  • Ask the user when a required business decision is ambiguous.
  • Stop and summarize partial progress when the budget is exhausted.

Define Completion Before the Loop Starts

Agents frequently overwork because "done" is not defined. Completion might mean every requested field is present, evidence meets a confidence threshold, a test suite passes, or a reviewer approves the proposed action.

Use several stop conditions together: success criteria, maximum steps, time limit, cost budget, repeated-action detection, and cancellation.

Verify Results Instead of Trusting Confidence

A model saying it is finished is not proof. Verify with deterministic checks when possible: schema validation, database constraints, test execution, citation coverage, calculation checks, or comparison against expected records.

Use model-based judging only where deterministic checks cannot capture quality, and calibrate those judges against human-reviewed examples.

Planning Is a Runtime Strategy

Planning is not a guarantee that the model is reasoning correctly. It is a runtime strategy for decomposing work, choosing actions, and deciding when enough progress has been made. Some tasks need no plan. Some need a short checklist. Some need iterative ReAct-style tool use. Some need a planner-executor-reviewer pattern.

Choose planning depth based on uncertainty and risk. A simple classification should not spend tokens building a multi-step plan. A research task may need an explicit plan because the agent must search, compare, cite, and revise. A write-capable task may need a plan plus approval because the consequence is higher.

Plans should be inspectable and updateable. If the agent discovers missing evidence, a failed tool, or a policy restriction, it should revise the plan rather than continue blindly. The runtime should store the current plan, completed steps, open questions, and stop reason.

The application should validate plans before risky execution. A model-generated plan that includes "email the customer" or "delete duplicate records" should pass through policy and approval checks before any tool call happens.

  • Use minimal planning for low-uncertainty tasks.
  • Use explicit plans for research, multi-tool, or high-risk tasks.
  • Store plan state so progress is visible and resumable.
  • Revise plans when observations contradict assumptions.
  • Validate risky plan steps with deterministic policy.

Reasoning Failure Modes and Stop Conditions

Agent reasoning fails in recognizable patterns. The model may over-plan, repeat the same tool, chase irrelevant evidence, invent a missing observation, ignore a failed tool, or keep working after the answer is already good enough. These are runtime problems as much as model problems.

Stop conditions turn vague autonomy into controlled autonomy. Define success checks, maximum iterations, repeated-action detection, evidence thresholds, confidence thresholds, and escalation triggers. When the agent stops, it should explain whether it completed the task, needs user input, hit a budget, or found insufficient evidence.

For complex tasks, add a verification step. The verifier should inspect the answer against the goal, evidence, policy, and trace. Verification can be model-assisted, deterministic, or human-reviewed depending on risk. The important point is that the same component that generated the answer should not be the only judge of quality.

Planning quality should be evaluated from traces. Do not only score the final answer. Score whether the plan was appropriate, whether tool calls were necessary, whether the agent recovered from errors, and whether it stopped for the right reason.

  • Detect repeated actions and unproductive loops.
  • Require evidence before factual conclusions.
  • Escalate when policy, confidence, or missing data demands it.
  • Use separate verification for important outputs.
  • Track stop reasons as a quality metric.

Planning Evaluation Exercise

Evaluate planning by comparing the plan to the task, not by admiring how detailed it looks. A good plan is short enough to execute, specific enough to inspect, and flexible enough to change when observations arrive. A long plan that ignores evidence is worse than no plan.

For each test task, record the initial plan, tool calls, revised plan, stop reason, and final outcome. This reveals whether the agent is actually adapting or merely producing planning text before improvising. It also shows whether planning consumes more cost than it saves.

Include tasks where the best behavior is to ask a question, refuse, or stop early. Planning systems often fail by continuing to act when uncertainty should trigger clarification or escalation.

  • Score plan relevance, not plan length.
  • Track revisions after tool observations.
  • Measure unnecessary steps and repeated actions.
  • Reward correct clarification and early stopping.

Bounded Next-Action Loop

The runtime executes one proposed action at a time and stops on success, repetition, or budget exhaustion.

Bounded Next-Action Loop
def choose_next_action(state: dict) -> dict:
    if not state["observations"]:
        return {"name": "search_order", "args": {"order_id": state["order_id"]}}
    return {"name": "finish", "args": {}}

def run_agent(order_id: str) -> dict:
    state = {"order_id": order_id, "observations": [], "actions": []}

    for _ in range(4):
        action = choose_next_action(state)
        signature = (action["name"], repr(action["args"]))

        if signature in state["actions"]:
            return {"status": "blocked", "reason": "repeated action", "state": state}

        state["actions"].append(signature)

        if action["name"] == "finish":
            return {"status": "completed", "state": state}

        if action["name"] == "search_order":
            state["observations"].append({"status": "shipped"})

    return {"status": "budget_exhausted", "state": state}

print(run_agent("ORD-1042")["status"])
  • The loop has a hard step limit.
  • Repeated actions are detected before another tool call.
  • The final status distinguishes success from controlled failure.

Plan with Explicit Completion Tests

Each step states what evidence marks it complete.

Plan with Explicit Completion Tests
plan = [
    {
        "step": "load_invoice",
        "complete_when": "invoice total and vendor id are present",
    },
    {
        "step": "load_purchase_order",
        "complete_when": "matching order is found",
    },
    {
        "step": "compare_amounts",
        "complete_when": "difference is calculated",
    },
    {
        "step": "request_review",
        "complete_when": "reviewer approves or rejects the mismatch",
    },
]

for item in plan:
    print(f"{item['step']}: {item['complete_when']}")
  • Completion criteria make progress measurable.
  • Human review is part of the workflow, not an afterthought.
  • A production executor would store status and evidence for each step.
Key Takeaways
  • Use deterministic workflows when the route is already known.
  • Keep plans concrete, short, and tied to completion evidence.
  • Observe every tool result before selecting the next action.
  • Limit steps, time, cost, retries, and plan revisions.
  • Verify completion with code or external evidence where possible.
Common Mistakes to Avoid
Creating long plans for tasks that need only one tool call.
Treating the generated plan as authorization to execute sensitive actions.
Replanning repeatedly without identifying the actual blocker.
Stopping only when the model claims success.

Practice Tasks

  • Implement repeated-action detection in a tool loop.
  • Write completion criteria for an invoice reconciliation plan.
  • Add timeout and cost-budget stop conditions.
  • Design a fallback path for missing or contradictory evidence.

Frequently Asked Questions

No. Planning is an application-visible representation of intended work. It can be concise and useful without exposing private hidden reasoning.

Only when a full plan improves coordination, review, or dependency management. Many tasks work better with one-step-at-a-time decisions.

Combine hard budgets with repeated-action detection, clear success criteria, bounded retries, and a graceful escalation path.

Ready to Level Up Your Skills?

Explore 500+ free tutorials across 20+ languages and frameworks.