AI Agent Tools and Actions: How Agents Safely Use APIs, Search and Code

Safe Tool Design Flow

Name the business action precisely.
Define required and optional inputs.
Validate types and authorization.
Return structured results, not vague text.
Log who requested it, why, and what happened.
Require approval for sensitive writes.

Read Tools and Write Tools

A tool is a controlled capability the agent can request. It may search documents, call an API, query a database, send an email, run a calculation, create a ticket, or execute code in a sandbox.

Tools are the difference between an agent that only talks and an agent that can work. But tools also create risk. A badly designed tool can leak data, delete records, spam customers, or let prompt injection control business systems.

The safest way to design tools is to make each tool narrow, typed, auditable, and easy to explain. If a human cannot quickly understand what a tool does, the model probably should not be given that tool directly.

Read tools fetch information. Write tools change the world. This difference is huge. Searching a help center is usually low risk. Refunding money, deleting a file, changing permissions, or sending a legal email is high risk.

Production agents should treat write tools as privileged operations. A write tool may need user confirmation, policy checks, idempotency, rollback plans, and stronger logging.

Read tool: search_docs, get_order_status, list_recent_errors.
Write tool: issue_refund, send_email, update_subscription, deploy_service.
Sensitive write tool: transfer_money, delete_account, change_access_role.

Tool Descriptions Must Be Honest

The model chooses tools based on names, descriptions, and schemas. If the description is vague, the model may use the tool incorrectly. A good description says what the tool does, when to use it, and what it does not do.

For example, do not name a tool update_customer if it can also charge a card. Split those actions. The model should not need to guess hidden behavior.

Use verbs that match the action: search, get, create, update, cancel.
Avoid broad words like manage, process, handle, execute.
Return machine-readable fields so the next step can reason reliably.

Text Diagram: Tool Call Lifecycle

The model is only one part of tool use. The runtime should guard the path before and after execution.

LLM proposes tool call
Schema validator checks shape
Auth policy checks user permission
Risk policy checks approval requirement
Executor calls API
Observation is sanitized
Agent decides next step

Common Tool Failures

Tools fail in ordinary ways: network timeouts, missing records, invalid IDs, expired tokens, rate limits, schema mismatches, partial success, and inconsistent external systems. The agent should not pretend everything worked.

Return errors in a structured way. The model can reason better over error_code, retryable, and message fields than over a raw stack trace.

Use retries only for safe and retryable operations.
Never retry payment or email actions without idempotency.
Sanitize tool output before putting it back into the model context.
Tell the user when human support or manual action is required.

Tool Design Starts with Action Taxonomy

Before defining schemas, classify every tool by what it can do. Read tools retrieve information. Analysis tools transform or compute. Draft tools prepare content without publishing it. Write tools change external systems. Privileged tools affect money, access, deletion, code execution, production infrastructure, or public communication. This taxonomy determines validation, approval, logging, and retry behavior.

The model should never receive a vague tool like `do_task` or `run_command` unless the runtime is heavily sandboxed and the product is explicitly a coding or automation environment. Most business agents become safer and easier to evaluate when tools are narrow. A `create_refund_draft` tool is easier to approve than a general `manage_order` tool.

Tool descriptions should teach the model when not to call the tool. Include constraints, required evidence, and common alternatives. A tool that searches policy should say it is for approved policy documents, not customer records. A tool that sends a notification should say it requires prior approval or a draft step first.

Use typed schemas with required fields and bounded enums.
Reject unknown fields instead of silently ignoring them.
Validate authorization from trusted runtime context.
Add idempotency keys for write operations.
Return structured results with status, evidence, and safe error messages.

Execution Safety Around Tools

A tool call is where model intention meets real systems. That boundary should include schema validation, policy checks, timeout, rate limit, retry rules, result filtering, and trace emission. The model may generate the arguments, but trusted code must decide whether execution is allowed.

Treat tool output as untrusted input to the next model turn. Backend content can include prompt injection, stale data, hidden fields, or misleading text. Return only necessary fields, preserve provenance, and keep sensitive values out of the model context whenever possible.

Separate validation errors from authorization denials and backend failures.
Do not auto-retry non-idempotent write actions.
Mask secrets and internal errors in tool output.
Track tool latency, error rates, and unsafe attempt counts.
Write regression tests for tool-choice mistakes, not only schema parsing.

Tool Review Before Production

Before a tool reaches production, review it as both an API and a model-accessible capability. The API review asks whether inputs are typed, validated, authorized, idempotent, observable, and bounded. The model-access review asks whether the name, description, and output help the model use the tool only when appropriate.

A safe tool should have narrow responsibility. If one tool can search, update, delete, and notify depending on an action field, the model has too much room to choose a dangerous branch accidentally. Split tools when risk, authorization, approval, or retry behavior differs.

Tool output also needs review. The next model turn will read it, so output should be compact, factual, source-aware, and stripped of secrets or internal stack traces. If a backend returns a huge object, the tool should shape it into the smallest useful result rather than dumping everything into context.

Finally, test tools with adversarial inputs. Include prompt injection in retrieved content, invalid enum values, unauthorized targets, duplicate writes, and backend timeouts. Tool safety is proven through hostile cases, not only clean examples.

Split tools by side effect, authorization, and approval requirement.
Keep descriptions clear about when not to call the tool.
Shape outputs for the next model step and remove sensitive fields.
Test invalid, unauthorized, duplicate, and adversarial cases.

Split a Risky Tool

Take one broad tool and split it into safer capabilities. For example, replace `manage_order` with `lookup_order`, `draft_refund`, and `submit_refund_after_approval`. Notice how each smaller tool has clearer schema, authorization, approval, retry, and audit behavior.

Then test the model with tool descriptions only. If it cannot reliably choose the correct tool from the descriptions, the names or boundaries are still too vague. Good tool design teaches the model safe behavior through structure, not hope.

Split tools when side effects differ.
Keep descriptions decision-useful.
Test tool choice as part of evaluation.

Retest the Whole Tool Set

Repeat tool-choice tests after adding new tools, because one vague capability can confuse selection across the entire tool set.

Action Boundary

A tool schema is a request boundary, not proof of permission. Validate shape, normalize values, derive identity from trusted runtime context, authorize the exact object, enforce rate and size limits, execute the domain command, then return a bounded result. Never trust a model-supplied tenant ID, approval claim, or role.

Split read, draft, commit, destructive, and open-world operations when their risk differs. A `manage_account` tool hides too many policies; `read_account_summary`, `draft_address_change`, and `commit_address_change` can carry distinct scopes, confirmations, idempotency, and audit rules.

Errors should help the loop recover safely. Distinguish invalid input, denied access, missing data, conflict, transient dependency failure, and permanent failure without exposing stack traces or secrets. Give retriable actions an idempotency key and a bounded retry policy so a timeout cannot create duplicate orders or messages.

Authorize with trusted runtime identity at execution time.
Separate capabilities when side effects or permissions differ.
Return typed safe failures the planner can classify.
Make consequential retries idempotent and auditable.

Tool Boundary Examples

Narrow Tool Schema Example

This schema describes one safe read action. Notice that it does not expose SQL, internal table names, or broad database access.

Narrow Tool Schema Example

{
  "name": "get_order_status",
  "description": "Look up the shipping and payment status for one order that belongs to the authenticated user.",
  "input_schema": {
    "type": "object",
    "properties": {
      "order_id": {
        "type": "string",
        "description": "Public order ID shown to the customer, for example ORD-2026-1042."
      }
    },
    "required": ["order_id"]
  }
}

The tool does one thing.
The authenticated user is checked outside the model.
The schema limits the input shape.

Structured Tool Error

{
  "ok": false,
  "error_code": "ORDER_NOT_FOUND",
  "retryable": false,
  "safe_message": "No order was found for this account with that order ID."
}

The model sees a safe message, not a database stack trace.
retryable tells the agent whether another call is useful.

Before you move on