Tool use is where many teams first feel the difference between a demo agent and an engineered system. Once an agent can search, query databases, send messages, or mutate records, the orchestration around those actions matters as much as the model itself.
LangGraph is well-suited to tool-calling because it keeps the loop visible. The graph can decide when to call tools, when to validate tool arguments, when to stop, and when to escalate to a human instead of blindly continuing.
This lesson treats tool use as an operational workflow, not as a magical extension of prompting. The main idea is simple: the model may suggest actions, but the application owns whether those actions are allowed, how they are executed, and when the run is complete.
A common shape is agent node -> tool node -> agent node. The agent examines state and decides whether it needs tools. The tool node executes only approved tools and writes observations back to state. The loop continues until the agent indicates it can finish.
The key design insight is that the graph runtime controls the loop, not the model alone. That is what makes the system inspectable and governable.
This pattern matters because many agent bugs are really loop bugs. The model keeps asking for tools with slightly different wording, repeats the same search, or tries to recover from one failure by making three worse requests. When the loop is encoded in the graph, you can add stop conditions, counters, and alternate paths without rewriting the whole application.
Do not let the same node both decide on a tool and perform irreversible side effects unless the workflow is extremely low risk. A safer design places validation between model output and tool execution.
That validation layer can enforce argument shape, business permissions, redaction, rate limits, and explicit human approval before sensitive actions.
In practice, this means you should think of tool use as three separate responsibilities. First, the model proposes an action. Second, deterministic code checks whether the proposal is well-formed and allowed. Third, a narrowly scoped execution node performs the action and captures the result. Splitting these roles makes failures easier to diagnose and risky behavior much easier to contain.
Tool-oriented graphs usually need fields such as `messages`, `tool_requests`, `tool_results`, `current_step`, `errors`, and `final_answer`. Those fields make it possible to replay exactly why the agent made a choice.
Keep raw tool outputs manageable. Very large payloads should be summarized into state and stored externally if needed for later retrieval.
A healthy state distinguishes between what the model needs for the next reasoning step and what operators need for traceability. Sometimes those overlap, but often they do not. For example, the model may only need a short observation like "3 matching orders found" while your logs may still record latency, status code, tool name, and a sanitized argument summary.
A tool-calling agent is strongest when it operates inside a clear capability boundary. The model should be able to choose among a limited set of named tools with stable contracts, not invent arbitrary functions or pass uncontrolled blobs into your infrastructure.
This is one of the biggest differences between a tutorial toy and a production service. In production, every tool should have a clear purpose, a well-defined schema, and an explanation for why the model is allowed to access it at all.
Read-only tools like search or weather are relatively low risk. Write tools like deleting data, issuing refunds, or sending emails should almost always go through stronger controls.
The graph is a great place to encode those controls because they become part of the visible workflow instead of hidden middleware that teammates forget exists.
A useful mental split is read, recommend, and write. Read tools gather information. Recommend tools produce candidate actions such as draft replies or proposed changes. Write tools alter an external system. The more you move from read toward write, the more explicit your checks should become.
A tool loop must know how it ends. Some runs end because the model says it has enough information. Others end because a validator rejects the request, a retry limit is reached, or a human review step takes over. If none of those endings are explicit, the graph becomes vulnerable to wasted tokens and confusing repeated actions.
A practical design is to store a step counter or tool call counter in state and route away from the loop when it exceeds a limit. That gives the system a controlled failure mode, such as returning a partial answer, requesting clarification, or escalating to support staff.
Start with a user question. The agent node decides whether it can answer directly. If not, it requests search. The tool node runs search and stores observations. The agent reads those observations and either requests another tool or drafts the final answer.
This flow is much easier to reason about than a hidden agent loop because each cycle leaves named state behind.
Notice what the graph makes visible: why search was chosen, what query was sent, what came back, whether the result was sufficient, and how the final answer used that evidence. When a user says "the agent hallucinated" or "the tool was never called," this visibility turns a vague complaint into a debuggable execution trace.
Prebuilt agent helpers are a good starting point when your main goal is to get a standard tool loop working quickly. They reduce boilerplate and help you learn the expected message flow.
Custom graphs become the better choice when you need stricter validation, custom routing, mixed deterministic and agentic steps, domain-specific approval policies, or richer operational state. If your workflow needs to be explained in an incident review, a custom graph often pays for itself quickly.
This shows the concept without needing a model provider: the node expresses a tool request and a separate node fulfills it.
from typing_extensions import TypedDict
class SearchState(TypedDict):
question: str
tool_request: str
observation: str
answer: str
def decide(state: SearchState) -> dict:
return {"tool_request": "search_docs"}
def search_docs(state: SearchState) -> dict:
return {"observation": "Found refund policy: requests allowed within 30 days."}
def finalize(state: SearchState) -> dict:
return {"answer": f"Using search result: {state['observation']}"}
A validation node is a clean place to reject malformed or unauthorized tool requests before any side effect happens.
from typing_extensions import TypedDict, Literal
class ActionState(TypedDict):
tool_name: str
tool_args: dict
allowed: bool
def validate_tool_request(state: ActionState) -> dict:
allowed_tools = {"search_docs", "lookup_order"}
has_required_args = isinstance(state["tool_args"], dict)
return {"allowed": state["tool_name"] in allowed_tools and has_required_args}
def route_validation(state: ActionState) -> Literal["run_tool", "reject_request"]:
return "run_tool" if state["allowed"] else "reject_request"
This example shows a fuller pattern: the graph tracks whether a tool is needed, executes a read-only search tool, stores observations, and stops after a bounded number of iterations.
from typing_extensions import TypedDict, Literal
from langgraph.graph import StateGraph, START, END
class ResearchState(TypedDict):
question: str
needs_tool: bool
tool_name: str
tool_args: dict
observation: str
answer: str
tool_calls: int
def agent_step(state: ResearchState) -> dict:
if state["observation"]:
return {
"needs_tool": False,
"answer": f"Based on the search result: {state['observation']}"
}
return {
"needs_tool": True,
"tool_name": "search_docs",
"tool_args": {"query": state["question"]}
}
def route_after_agent(state: ResearchState) -> Literal["run_tool", "finish", "too_many_steps"]:
if state["tool_calls"] >= 3:
return "too_many_steps"
return "run_tool" if state["needs_tool"] else "finish"
def run_tool(state: ResearchState) -> dict:
query = state["tool_args"].get("query", "")
return {
"observation": f"Search result for '{query}': refunds are allowed within 30 days.",
"tool_calls": state["tool_calls"] + 1,
}
def too_many_steps(state: ResearchState) -> dict:
return {
"answer": "I could not complete this reliably within the tool limit. Please refine the question or escalate."
}
builder = StateGraph(ResearchState)
builder.add_node("agent_step", agent_step)
builder.add_node("run_tool", run_tool)
builder.add_node("too_many_steps", too_many_steps)
builder.add_edge(START, "agent_step")
builder.add_conditional_edges(
"agent_step",
route_after_agent,
{
"run_tool": "run_tool",
"finish": END,
"too_many_steps": "too_many_steps",
}
)
builder.add_edge("run_tool", "agent_step")
builder.add_edge("too_many_steps", END)
graph = builder.compile()
result = graph.invoke({
"question": "What is the refund window?",
"needs_tool": False,
"tool_name": "",
"tool_args": {},
"observation": "",
"answer": "",
"tool_calls": 0,
})
print(result["answer"])
Combine tool execution with human review when the action changes external state.
from typing_extensions import TypedDict
from langgraph.types import interrupt
class EmailState(TypedDict):
to: str
subject: str
body: str
approved: bool
def request_approval(state: EmailState) -> dict:
decision = interrupt({
"action": "send_email",
"to": state["to"],
"subject": state["subject"],
"body": state["body"],
})
return {"approved": decision.get("approved", False)}
No. Prebuilt agents are convenient, but LangGraph lets you express the loop explicitly when you need custom control.
Usually into state as concise observations or normalized result fields that later nodes can reason about.
Treating all tools like read-only helpers when some of them can change real systems or data.
It can propose them, but deterministic validation should still check schema, permissions, and business constraints before execution.
Usually fewer than teams expect. A smaller set of well-designed tools is easier for the model to choose from and easier for humans to govern.
Explore 500+ free tutorials across 20+ languages and frameworks.