AI Agents Cheat Sheet: Architecture, Tools, Memory, Security, and Production Checks

When to Use an Agent

Use this cheat sheet after the full lessons or during system design. It condenses the questions that matter before an agent is allowed to act.

The central distinction is between probabilistic judgment and deterministic control. Use the model where language and ambiguity matter; use code for permissions, schemas, arithmetic, limits, and irreversible effects.

Use an agent when the goal is clear but the next step depends on interpreting language, selecting tools, or reacting to newly discovered information.

Chatbot: conversation, explanation, transformation.
Workflow: known steps and predictable branches.
Agent: dynamic next-action choice under controlled tools and state.
Normal code: stable rules, calculations, CRUD, and permissions.

Core Agent Loop

Goal -> assemble trusted context -> model proposes action -> validate policy and schema -> execute tool -> store observation -> verify progress -> continue, finish, clarify, or escalate.

Always define success criteria.
Always define maximum steps, time, retries, and cost.
Detect repeated actions and unproductive loops.
Return a useful partial result when a budget ends.

Tools and Permissions

Tools should be narrow, typed, observable, and classified by side effect. Authorization comes from authenticated application context.

Read: search, lookup, retrieve.
Draft: prepare content without publishing.
Write: change a system or send externally visible content.
Destructive or privileged: delete, pay, grant access, execute code.
Require validation, idempotency, timeout, rate limit, and approval as risk increases.

State, Memory, and Knowledge

State records the current run. Checkpoints make runs resumable. Memory stores reusable user or workflow facts. RAG retrieves external knowledge with provenance.

Keep current facts separate from append-only history.
Summarize or expire old state.
Preserve source IDs and enforce permissions before retrieval.
Treat memories and retrieved content as untrusted data.

Security Review

Assume model output and external content can be wrong or malicious. Limit capability so one mistake cannot become a major incident.

Threat model users, documents, webpages, tools, credentials, and tenants.
Never place secrets in model context when a trusted tool can hold them.
Use least privilege, sandboxes, quotas, and kill switches.
Require confirmation for financial, destructive, privileged, or public actions.

Evaluation and Operations

Evaluate realistic tasks before release and monitor outcomes after release. Inspect the complete trace, not just the final wording.

Quality: task success, factuality, citation correctness, tool selection.
Safety: policy violations, unsafe action rate, data leakage, escalation quality.
Operations: p95 latency, cost per success, retries, failures, timeouts.
Product: user corrections, acceptance, abandonment, and business outcome.

Release Checklist

Before production, confirm that the agent has versioned instructions, typed tools, tested permissions, durable state where needed, traces, evaluations, budgets, approval paths, and rollback controls.

Can the team explain every external action from a trace?
Can a user cancel a long-running task?
Can operators disable a tool or model quickly?
Do failures preserve partial progress and avoid repeated side effects?
Does monitoring detect quality drift as well as infrastructure errors?

Decision Order for Designing an Agent

Use the cheat sheet in a specific order. First define the user, job, success metric, and non-goals. Then decide whether the workflow really needs an agent or whether a normal workflow, retrieval feature, or chatbot is enough. Only after that should you choose tools, memory, model routing, and framework details. This order prevents the common mistake of starting with orchestration before the problem is measurable.

A practical expert review asks five questions. What can the model decide? What must trusted code decide? What evidence will the model see? What external action can happen? What proves the result is good enough? If any answer is vague, the agent is not ready for production even if the demo looks impressive.

The most useful agents are narrow and inspectable. They may feel less magical than broad autonomous assistants, but they can be evaluated, secured, and improved. A narrow support triage agent with strong traces, approval, and metrics teaches more engineering discipline than a universal assistant with no clear success condition.

Start with the user outcome and measurable success.
Declare non-goals before writing prompts.
Separate model judgment from deterministic control.
Design tools and permissions before connecting write actions.
Require evaluation data before production release.

Architecture Review Checklist

When reviewing an agent architecture, walk through one complete run from user request to final answer. Identify where context is assembled, where the model is called, where actions are validated, where tools execute, where observations are stored, and where the system decides to stop. If the team cannot explain each step from a trace, the architecture is too implicit.

Then review failure paths. The agent should behave well when retrieval finds no evidence, a tool times out, a user denies approval, a model chooses a repeated action, a budget is exhausted, or an external system returns partial data. Production quality is mostly visible in these non-happy paths.

Can every external action be reconstructed from logs or traces?
Can operators disable one tool without disabling the whole product?
Can the user correct bad memory or reject a proposed action?
Can the system return a useful partial result after failure?
Can tests catch a regression in tool choice or safety behavior?

How to Use This Cheat Sheet During Real Design

The best way to use a cheat sheet is as a design review sequence, not as a list to memorize. Start at the top of the workflow and ask whether the task truly needs an agent. If the steps are fixed and every branch is predictable, a normal workflow is safer and easier to maintain. If the next action depends on interpretation, tool results, or changing evidence, an agent may be justified.

Then review the control boundary. The model can propose a plan, draft content, choose a tool, or summarize evidence. The runtime must own permissions, tool schemas, budgets, retry rules, approvals, and final persistence. This separation keeps the system understandable when the model is wrong, ambiguous, or overconfident.

Finally, turn the cheat sheet into a release checklist. For each page of the architecture, ask what is logged, what is tested, what can be disabled, and what happens when the user cancels. A production agent is not defined by how smart one answer sounds; it is defined by whether the team can explain and control every important action.

Use the cheat sheet as a review order: need, control, tools, memory, safety, evaluation, operations.
Prefer deterministic code for policy, arithmetic, permissions, and irreversible effects.
Require a trace for every model decision that leads to external action.
Convert each checklist item into a test, metric, or operational runbook.

Run an Architecture Readiness Review

Pick one agent you have built or plan to build and score it against every major cheat-sheet area: task fit, tool risk, memory policy, retrieval evidence, guardrails, evaluation, observability, cost, and deployment. Give each area a status of ready, weak, or missing.

The value of this exercise is that it exposes hidden assumptions. A team may discover that tool schemas are strong but evaluation is weak, or that memory sounds useful but has no deletion policy. The cheat sheet becomes a living review process rather than a static reference.

Turn weak areas into backlog tasks.
Repeat the review before model, prompt, or tool changes.
Keep the completed checklist with release notes.

Keep the Review with the Release

Use the completed checklist as a release artifact. It should tell future maintainers what was reviewed, which risks remain, and which controls must be retested before the next change.

Agent Reference Examples

Release Readiness Gate

Turn the cheat sheet into an executable release check. A release is blocked when a required safety or operations control is missing.

Release Readiness Gate

controls = {
    "tool_allowlist": True,
    "human_approval_for_writes": True,
    "step_budget": True,
    "trace_redaction": False,
    "regression_suite": True,
}

required = set(controls)
missing = sorted(name for name in required if not controls[name])

if missing:
    print("BLOCK RELEASE")
    for control in missing:
        print("- Missing:", control)
else:
    print("READY FOR STAGED RELEASE")

The example intentionally fails because trace redaction is missing.
Production teams can generate this control map from CI evidence.
Risk-critical checks should block a release rather than lower an average score.

Agent Run Budget Calculator

This example calculates whether a run stays within explicit model-call, tool-call, and cost budgets.

Agent Run Budget Calculator

budget = {"model_calls": 4, "tool_calls": 6, "cost_usd": 0.08}
usage = {"model_calls": 3, "tool_calls": 7, "cost_usd": 0.06}

violations = []
for metric, limit in budget.items():
    if usage[metric] > limit:
        violations.append(f"{metric}: {usage[metric]} > {limit}")

print("STOP_AND_SUMMARIZE" if violations else "CONTINUE")
for violation in violations:
    print("-", violation)

The run stops because tool usage exceeded its limit.
Separate budgets expose the real source of runaway behavior.
A production runtime should return partial progress when a budget is exhausted.

Before you move on