LangGraph Tool Calling Agents: Controlled Tool Loops, Validation, and Safety Boundaries

The Canonical Agent-Tool Loop

Tool use is where many teams first feel the difference between a demo agent and an engineered system. Once an agent can search, query databases, send messages, or mutate records, the orchestration around those actions matters as much as the model itself.

LangGraph is well-suited to tool-calling because it keeps the loop visible. The graph can decide when to call tools, when to validate tool arguments, when to stop, and when to escalate to a human instead of blindly continuing.

This lesson treats tool use as an operational workflow, not as a magical extension of prompting. The main idea is simple: the model may suggest actions, but the application owns whether those actions are allowed, how they are executed, and when the run is complete.

A common shape is agent node -> tool node -> agent node. The agent examines state and decides whether it needs tools. The tool node executes only approved tools and writes observations back to state. The loop continues until the agent indicates it can finish.

The key design insight is that the graph runtime controls the loop, not the model alone. That is what makes the system inspectable and governable.

This pattern matters because many agent bugs are really loop bugs. The model keeps asking for tools with slightly different wording, repeats the same search, or tries to recover from one failure by making three worse requests. When the loop is encoded in the graph, you can add stop conditions, counters, and alternate paths without rewriting the whole application.

Agent decides whether a tool is needed.
Tool executor performs the allowed action.
Observation returns to state.
Router decides continue versus END.

Separate Decision, Validation, and Execution

Do not let the same node both decide on a tool and perform irreversible side effects unless the workflow is extremely low risk. A safer design places validation between model output and tool execution.

That validation layer can enforce argument shape, business permissions, redaction, rate limits, and explicit human approval before sensitive actions.

In practice, this means you should think of tool use as three separate responsibilities. First, the model proposes an action. Second, deterministic code checks whether the proposal is well-formed and allowed. Third, a narrowly scoped execution node performs the action and captures the result. Splitting these roles makes failures easier to diagnose and risky behavior much easier to contain.

Model judgment should propose actions.
Deterministic code should validate action safety.
Execution nodes should be narrow and auditable.

State Design for Tool Work

Tool-oriented graphs usually need fields such as `messages`, `tool_requests`, `tool_results`, `current_step`, `errors`, and `final_answer`. Those fields make it possible to replay exactly why the agent made a choice.

Keep raw tool outputs manageable. Very large payloads should be summarized into state and stored externally if needed for later retrieval.

A healthy state distinguishes between what the model needs for the next reasoning step and what operators need for traceability. Sometimes those overlap, but often they do not. For example, the model may only need a short observation like "3 matching orders found" while your logs may still record latency, status code, tool name, and a sanitized argument summary.

Track requested tool name and arguments.
Record success, failure, and latency of each tool call.
Preserve a human-readable observation trail for debugging.

Tool Selection Should Be Explicit, Not Open-Ended

A tool-calling agent is strongest when it operates inside a clear capability boundary. The model should be able to choose among a limited set of named tools with stable contracts, not invent arbitrary functions or pass uncontrolled blobs into your infrastructure.

This is one of the biggest differences between a tutorial toy and a production service. In production, every tool should have a clear purpose, a well-defined schema, and an explanation for why the model is allowed to access it at all.

Prefer a small whitelist over a broad registry.
Give each tool one clear job with predictable arguments.
Normalize tool outputs into a format the next node can reason about.
Remove tools the agent does not genuinely need.

Safety Rules for External Actions

Read-only tools like search or weather are relatively low risk. Write tools like deleting data, issuing refunds, or sending emails should almost always go through stronger controls.

The graph is a great place to encode those controls because they become part of the visible workflow instead of hidden middleware that teammates forget exists.

A useful mental split is read, recommend, and write. Read tools gather information. Recommend tools produce candidate actions such as draft replies or proposed changes. Write tools alter an external system. The more you move from read toward write, the more explicit your checks should become.

Whitelist tools explicitly.
Validate arguments before execution.
Add approval for write actions or high-cost operations.
Log tool name, arguments summary, result, and error state.

Loop Termination and Retry Discipline

A tool loop must know how it ends. Some runs end because the model says it has enough information. Others end because a validator rejects the request, a retry limit is reached, or a human review step takes over. If none of those endings are explicit, the graph becomes vulnerable to wasted tokens and confusing repeated actions.

A practical design is to store a step counter or tool call counter in state and route away from the loop when it exceeds a limit. That gives the system a controlled failure mode, such as returning a partial answer, requesting clarification, or escalating to support staff.

Count tool iterations explicitly.
Differentiate retrying the same tool from choosing a different tool.
Route repeated failures to fallback behavior instead of infinite loops.
Prefer graceful degradation over silent spinning.

Execution Flow Analysis for a Search Agent

Start with a user question. The agent node decides whether it can answer directly. If not, it requests search. The tool node runs search and stores observations. The agent reads those observations and either requests another tool or drafts the final answer.

This flow is much easier to reason about than a hidden agent loop because each cycle leaves named state behind.

Notice what the graph makes visible: why search was chosen, what query was sent, what came back, whether the result was sufficient, and how the final answer used that evidence. When a user says "the agent hallucinated" or "the tool was never called," this visibility turns a vague complaint into a debuggable execution trace.

Start state: question, messages
Agent node adds tool request
Tool node adds search results
Agent node either loops or drafts final answer
END returns traceable final state

When to Use Prebuilt Agents Versus Custom Graphs

Prebuilt agent helpers are a good starting point when your main goal is to get a standard tool loop working quickly. They reduce boilerplate and help you learn the expected message flow.

Custom graphs become the better choice when you need stricter validation, custom routing, mixed deterministic and agentic steps, domain-specific approval policies, or richer operational state. If your workflow needs to be explained in an incident review, a custom graph often pays for itself quickly.

Use prebuilt patterns for fast prototypes and standard loops.
Use custom graphs when governance or observability matters.
Do not hide business-critical decisions inside generic agent wrappers.

Tool Loop Examples

Beginner Example: Read-Only Search Tool Pattern

This shows the concept without needing a model provider: the node expresses a tool request and a separate node fulfills it.

Beginner Example: Read-Only Search Tool Pattern

from typing_extensions import TypedDict

class SearchState(TypedDict):
    question: str
    tool_request: str
    observation: str
    answer: str

def decide(state: SearchState) -> dict:
    return {"tool_request": "search_docs"}

def search_docs(state: SearchState) -> dict:
    return {"observation": "Found refund policy: requests allowed within 30 days."}

def finalize(state: SearchState) -> dict:
    return {"answer": f"Using search result: {state['observation']}"}

The tool execution is separate from the decision to use it.
Observations come back into state for later reasoning.
This explicitness is the basis of reliable agent loops.

Intermediate Example: Validate Tool Arguments Before Execution

A validation node is a clean place to reject malformed or unauthorized tool requests before any side effect happens.

Intermediate Example: Validate Tool Arguments Before Execution

from typing_extensions import TypedDict, Literal

class ActionState(TypedDict):
    tool_name: str
    tool_args: dict
    allowed: bool

def validate_tool_request(state: ActionState) -> dict:
    allowed_tools = {"search_docs", "lookup_order"}
    has_required_args = isinstance(state["tool_args"], dict)
    return {"allowed": state["tool_name"] in allowed_tools and has_required_args}

def route_validation(state: ActionState) -> Literal["run_tool", "reject_request"]:
    return "run_tool" if state["allowed"] else "reject_request"

Validation belongs in deterministic code.
The graph can block unsafe requests before execution.
This pattern scales well for governance-heavy systems.

Detailed Example: Multi-Step Research Agent with Loop Guard

This example shows a fuller pattern: the graph tracks whether a tool is needed, executes a read-only search tool, stores observations, and stops after a bounded number of iterations.

Detailed Example: Multi-Step Research Agent with Loop Guard

from typing_extensions import TypedDict, Literal
from langgraph.graph import StateGraph, START, END

class ResearchState(TypedDict):
    question: str
    needs_tool: bool
    tool_name: str
    tool_args: dict
    observation: str
    answer: str
    tool_calls: int

def agent_step(state: ResearchState) -> dict:
    if state["observation"]:
        return {
            "needs_tool": False,
            "answer": f"Based on the search result: {state['observation']}"
        }

    return {
        "needs_tool": True,
        "tool_name": "search_docs",
        "tool_args": {"query": state["question"]}
    }

def route_after_agent(state: ResearchState) -> Literal["run_tool", "finish", "too_many_steps"]:
    if state["tool_calls"] >= 3:
        return "too_many_steps"
    return "run_tool" if state["needs_tool"] else "finish"

def run_tool(state: ResearchState) -> dict:
    query = state["tool_args"].get("query", "")
    return {
        "observation": f"Search result for '{query}': refunds are allowed within 30 days.",
        "tool_calls": state["tool_calls"] + 1,
    }

def too_many_steps(state: ResearchState) -> dict:
    return {
        "answer": "I could not complete this reliably within the tool limit. Please refine the question or escalate."
    }

builder = StateGraph(ResearchState)
builder.add_node("agent_step", agent_step)
builder.add_node("run_tool", run_tool)
builder.add_node("too_many_steps", too_many_steps)

builder.add_edge(START, "agent_step")
builder.add_conditional_edges(
    "agent_step",
    route_after_agent,
    {
        "run_tool": "run_tool",
        "finish": END,
        "too_many_steps": "too_many_steps",
    }
)
builder.add_edge("run_tool", "agent_step")
builder.add_edge("too_many_steps", END)

graph = builder.compile()

result = graph.invoke({
    "question": "What is the refund window?",
    "needs_tool": False,
    "tool_name": "",
    "tool_args": {},
    "observation": "",
    "answer": "",
    "tool_calls": 0,
})

print(result["answer"])

The loop is explicit: `agent_step` and `run_tool` alternate until the router ends the run.
A tool counter gives the workflow a deterministic escape hatch.
The answer is produced from observation already stored in state, which improves traceability.

Advanced Example: Pause Before a Write Tool

Combine tool execution with human review when the action changes external state.

Advanced Example: Pause Before a Write Tool

from typing_extensions import TypedDict
from langgraph.types import interrupt

class EmailState(TypedDict):
    to: str
    subject: str
    body: str
    approved: bool

def request_approval(state: EmailState) -> dict:
    decision = interrupt({
        "action": "send_email",
        "to": state["to"],
        "subject": state["subject"],
        "body": state["body"],
    })
    return {"approved": decision.get("approved", False)}

Interrupt payloads should stay JSON-serializable.
Approval logic is visible in the graph, not hidden in an external reviewer script.
This is the right shape for refund approvals, outbound emails, and permission changes.

Before you move on