A tool is a controlled capability the agent can request. It may search documents, call an API, query a database, send an email, run a calculation, create a ticket, or execute code in a sandbox.
Tools are the difference between an agent that only talks and an agent that can work. But tools also create risk. A badly designed tool can leak data, delete records, spam customers, or let prompt injection control business systems.
The safest way to design tools is to make each tool narrow, typed, auditable, and easy to explain. If a human cannot quickly understand what a tool does, the model probably should not be given that tool directly.
Read tools fetch information. Write tools change the world. This difference is huge. Searching a help center is usually low risk. Refunding money, deleting a file, changing permissions, or sending a legal email is high risk.
Production agents should treat write tools as privileged operations. A write tool may need user confirmation, policy checks, idempotency, rollback plans, and stronger logging.
The model chooses tools based on names, descriptions, and schemas. If the description is vague, the model may use the tool incorrectly. A good description says what the tool does, when to use it, and what it does not do.
For example, do not name a tool update_customer if it can also charge a card. Split those actions. The model should not need to guess hidden behavior.
The model is only one part of tool use. The runtime should guard the path before and after execution.
Tools fail in ordinary ways: network timeouts, missing records, invalid IDs, expired tokens, rate limits, schema mismatches, partial success, and inconsistent external systems. The agent should not pretend everything worked.
Return errors in a structured way. The model can reason better over error_code, retryable, and message fields than over a raw stack trace.
Before defining schemas, classify every tool by what it can do. Read tools retrieve information. Analysis tools transform or compute. Draft tools prepare content without publishing it. Write tools change external systems. Privileged tools affect money, access, deletion, code execution, production infrastructure, or public communication. This taxonomy determines validation, approval, logging, and retry behavior.
The model should never receive a vague tool like `do_task` or `run_command` unless the runtime is heavily sandboxed and the product is explicitly a coding or automation environment. Most business agents become safer and easier to evaluate when tools are narrow. A `create_refund_draft` tool is easier to approve than a general `manage_order` tool.
Tool descriptions should teach the model when not to call the tool. Include constraints, required evidence, and common alternatives. A tool that searches policy should say it is for approved policy documents, not customer records. A tool that sends a notification should say it requires prior approval or a draft step first.
A tool call is where model intention meets real systems. That boundary should include schema validation, policy checks, timeout, rate limit, retry rules, result filtering, and trace emission. The model may generate the arguments, but trusted code must decide whether execution is allowed.
Treat tool output as untrusted input to the next model turn. Backend content can include prompt injection, stale data, hidden fields, or misleading text. Return only necessary fields, preserve provenance, and keep sensitive values out of the model context whenever possible.
Before a tool reaches production, review it as both an API and a model-accessible capability. The API review asks whether inputs are typed, validated, authorized, idempotent, observable, and bounded. The model-access review asks whether the name, description, and output help the model use the tool only when appropriate.
A safe tool should have narrow responsibility. If one tool can search, update, delete, and notify depending on an action field, the model has too much room to choose a dangerous branch accidentally. Split tools when risk, authorization, approval, or retry behavior differs.
Tool output also needs review. The next model turn will read it, so output should be compact, factual, source-aware, and stripped of secrets or internal stack traces. If a backend returns a huge object, the tool should shape it into the smallest useful result rather than dumping everything into context.
Finally, test tools with adversarial inputs. Include prompt injection in retrieved content, invalid enum values, unauthorized targets, duplicate writes, and backend timeouts. Tool safety is proven through hostile cases, not only clean examples.
Take one broad tool and split it into safer capabilities. For example, replace `manage_order` with `lookup_order`, `draft_refund`, and `submit_refund_after_approval`. Notice how each smaller tool has clearer schema, authorization, approval, retry, and audit behavior.
Then test the model with tool descriptions only. If it cannot reliably choose the correct tool from the descriptions, the names or boundaries are still too vague. Good tool design teaches the model safe behavior through structure, not hope.
Repeat tool-choice tests after adding new tools, because one vague capability can confuse selection across the entire tool set.
For expert-level work, keep this page connected to an actual run trace. Concepts become much easier to understand when learners can see the input, state, model decision, tool behavior, safety check, and final outcome side by side.
This schema describes one safe read action. Notice that it does not expose SQL, internal table names, or broad database access.
{
"name": "get_order_status",
"description": "Look up the shipping and payment status for one order that belongs to the authenticated user.",
"input_schema": {
"type": "object",
"properties": {
"order_id": {
"type": "string",
"description": "Public order ID shown to the customer, for example ORD-2026-1042."
}
},
"required": ["order_id"]
}
}
{
"ok": false,
"error_code": "ORDER_NOT_FOUND",
"retryable": false,
"safe_message": "No order was found for this account with that order ID."
}
They can, but keep orchestration clear. In most systems the agent runtime should coordinate tool calls so logs, limits, and policies stay visible.
Usually no. Return the smallest structured result the agent needs, with sensitive fields removed.
Explore 500+ free tutorials across 20+ languages and frameworks.