Use this cheat sheet after the full lessons or during system design. It condenses the questions that matter before an agent is allowed to act.
The central distinction is between probabilistic judgment and deterministic control. Use the model where language and ambiguity matter; use code for permissions, schemas, arithmetic, limits, and irreversible effects.
Model proposes; runtime disposes. The application owns context, permissions, tools, state, budgets, validation, approval, and evidence of success.
Use an agent when the goal is clear but the next step depends on interpreting language, selecting tools, or reacting to newly discovered information.
Goal -> assemble trusted context -> model proposes action -> validate policy and schema -> execute tool -> store observation -> verify progress -> continue, finish, clarify, or escalate.
Tools should be narrow, typed, observable, and classified by side effect. Authorization comes from authenticated application context.
State records the current run. Checkpoints make runs resumable. Memory stores reusable user or workflow facts. RAG retrieves external knowledge with provenance.
Assume model output and external content can be wrong or malicious. Limit capability so one mistake cannot become a major incident.
Evaluate realistic tasks before release and monitor outcomes after release. Inspect the complete trace, not just the final wording.
Before production, confirm that the agent has versioned instructions, typed tools, tested permissions, durable state where needed, traces, evaluations, budgets, approval paths, and rollback controls.
Use the cheat sheet in a specific order. First define the user, job, success metric, and non-goals. Then decide whether the workflow really needs an agent or whether a normal workflow, retrieval feature, or chatbot is enough. Only after that should you choose tools, memory, model routing, and framework details. This order prevents the common mistake of starting with orchestration before the problem is measurable.
A practical expert review asks five questions. What can the model decide? What must trusted code decide? What evidence will the model see? What external action can happen? What proves the result is good enough? If any answer is vague, the agent is not ready for production even if the demo looks impressive.
The most useful agents are narrow and inspectable. They may feel less magical than broad autonomous assistants, but they can be evaluated, secured, and improved. A narrow support triage agent with strong traces, approval, and metrics teaches more engineering discipline than a universal assistant with no clear success condition.
When reviewing an agent architecture, walk through one complete run from user request to final answer. Identify where context is assembled, where the model is called, where actions are validated, where tools execute, where observations are stored, and where the system decides to stop. If the team cannot explain each step from a trace, the architecture is too implicit.
Then review failure paths. The agent should behave well when retrieval finds no evidence, a tool times out, a user denies approval, a model chooses a repeated action, a budget is exhausted, or an external system returns partial data. Production quality is mostly visible in these non-happy paths.
The best way to use a cheat sheet is as a design review sequence, not as a list to memorize. Start at the top of the workflow and ask whether the task truly needs an agent. If the steps are fixed and every branch is predictable, a normal workflow is safer and easier to maintain. If the next action depends on interpretation, tool results, or changing evidence, an agent may be justified.
Then review the control boundary. The model can propose a plan, draft content, choose a tool, or summarize evidence. The runtime must own permissions, tool schemas, budgets, retry rules, approvals, and final persistence. This separation keeps the system understandable when the model is wrong, ambiguous, or overconfident.
Finally, turn the cheat sheet into a release checklist. For each page of the architecture, ask what is logged, what is tested, what can be disabled, and what happens when the user cancels. A production agent is not defined by how smart one answer sounds; it is defined by whether the team can explain and control every important action.
Pick one agent you have built or plan to build and score it against every major cheat-sheet area: task fit, tool risk, memory policy, retrieval evidence, guardrails, evaluation, observability, cost, and deployment. Give each area a status of ready, weak, or missing.
The value of this exercise is that it exposes hidden assumptions. A team may discover that tool schemas are strong but evaluation is weak, or that memory sounds useful but has no deletion policy. The cheat sheet becomes a living review process rather than a static reference.
Use the completed checklist as a release artifact. It should tell future maintainers what was reviewed, which risks remain, and which controls must be retested before the next change.
For expert-level work, keep this page connected to an actual run trace. Concepts become much easier to understand when learners can see the input, state, model decision, tool behavior, safety check, and final outcome side by side.
A controlled software loop in which a model can choose among approved actions to make progress toward a goal.
The model may propose actions, but trusted code must validate permissions, arguments, budgets, and side effects.
Build one narrow evaluated project, then study a workflow framework, tracing platform, retrieval system, and security model in depth.
Explore 500+ free tutorials across 20+ languages and frameworks.