Multiple agents can help when different parts of a task require distinct instructions, tools, permissions, context, or ownership. Specialization is the reason to split; novelty is not.
A handoff transfers responsibility from one agent to another. A good handoff contains the goal, relevant evidence, completed work, open questions, constraints, and the reason the next specialist was selected.
Many systems described as multi-agent are better implemented as one agent plus deterministic tools or workflow nodes. Start with one agent and split only after evaluation shows a clear bottleneck.
A multi-agent system is a distributed workflow with language-model workers. Every added agent creates a new interface, failure mode, cost center, and coordination problem.
Use specialists when roles need different permission boundaries, large domain-specific contexts, independent evaluation criteria, or organizational ownership. For example, a support agent may gather evidence while a finance agent proposes a refund under stricter permissions.
Do not create separate agents merely for persona names. If two roles use the same tools, context, policy, and success criteria, a single agent with explicit modes is usually easier to operate.
A router sends each request to one specialist. A supervisor delegates subtasks and combines results. A pipeline passes work through fixed roles. Peer handoffs let specialists transfer control directly, but require stronger loop prevention.
Do not forward the entire transcript by default. Transfer a compact structured package with the original goal, relevant state, evidence references, decisions already made, unresolved questions, and permissions available to the receiver.
The receiving agent should validate that the request fits its role. If not, it should reject or escalate rather than bounce the task indefinitely.
Track the current owner, handoff count, visited agents, and reason for each transfer. Set a maximum handoff budget and route unresolved loops to a coordinator or human.
Shared state needs ownership rules. Define which agent may update each field and how conflicting updates are resolved.
A specialist can perform well while the overall system fails because routing is wrong or handoff context is incomplete. Measure routing accuracy, handoff completeness, end-to-end success, duplicate work, total cost, latency, and escalation quality.
Multi-agent systems are useful when specialized agents have different instructions, tools, context windows, or success criteria. They are not useful when they merely add more model calls to a task one agent can solve. A handoff should transfer responsibility with a clear reason, a compact state summary, and a defined expectation for the receiving agent.
A good handoff contains the user goal, completed work, relevant evidence, open questions, risk flags, and requested output. It should not dump the entire conversation unless the receiving agent truly needs it. Clean handoff packets reduce confusion and make traces easier to review.
The orchestrator should remain accountable for the workflow. Specialist agents can recommend actions, but the system still needs global budgets, approval policy, tool permissions, and final response rules. Otherwise multiple agents can each behave locally well while the overall system loops or contradicts itself.
Multi-agent failures often look like social confusion: two agents both think the other owns a task, a specialist acts on stale state, the orchestrator ignores a risk flag, or agents repeat the same analysis. These are architecture problems, not personality problems. Fix them with contracts, state ownership, and routing rules.
Start with one agent and add specialization only after you can name the bottleneck. If the bottleneck is retrieval quality, add a better retriever before adding a research agent. If the bottleneck is policy complexity, add deterministic policy checks before adding a compliance agent.
Multi-agent design is justified when different parts of the workflow need meaningfully different instructions, tools, memory, policies, or evaluation criteria. A refund specialist, account-security specialist, and general support agent may deserve separation because they operate under different rules and risks.
If the agents only have different names but share the same tools and goal, the architecture may be unnecessary. Extra agents add cost, latency, coordination failure, and debugging complexity. Experts add agents to reduce complexity at boundaries, not to make a system feel more advanced.
A handoff should transfer responsibility clearly. The receiving agent needs the user goal, relevant evidence, completed work, risk flags, and expected output. It does not need every token from the previous conversation unless the full transcript is truly required.
Evaluate handoffs by trace quality. A reviewer should see why the handoff happened, what state moved, what the specialist returned, and how the orchestrator used it. Hidden agent conversations are hard to debug and hard to trust.
Design a handoff packet for one specialist agent. Include the user goal, completed steps, evidence, open questions, risk flags, and requested output. Keep it compact enough that the receiving agent can act without rereading the entire conversation.
Then test a failed handoff: missing evidence, wrong specialist, or conflicting recommendation. The supervisor should detect the problem and recover rather than letting agents bounce the task back and forth.
A good handoff improves clarity and accountability; if it only adds more conversation, simplify back to one agent or one deterministic route.
The router creates a small handoff object instead of forwarding an uncontrolled transcript.
from dataclasses import dataclass, field
@dataclass
class Handoff:
target: str
goal: str
evidence_ids: list[str] = field(default_factory=list)
completed: list[str] = field(default_factory=list)
open_questions: list[str] = field(default_factory=list)
reason: str = ""
def route_ticket(ticket: dict) -> Handoff:
if ticket["category"] == "billing":
return Handoff(
target="billing_agent",
goal="Determine whether the invoice mismatch needs a credit.",
evidence_ids=ticket["evidence_ids"],
completed=["ticket classified"],
open_questions=["Does the purchase order match the invoiced quantity?"],
reason="The request requires billing policy and finance tools.",
)
return Handoff(target="general_support", goal=ticket["summary"])
print(route_ticket({
"category": "billing",
"summary": "Invoice mismatch",
"evidence_ids": ["ticket-8", "invoice-22"],
}))
Visited-agent tracking prevents endless delegation.
def can_handoff(state: dict, target: str) -> bool:
if state["handoff_count"] >= 3:
return False
if target in state["visited_agents"]:
return False
return True
state = {
"handoff_count": 1,
"visited_agents": {"router", "billing_agent"},
}
target = "billing_agent"
print("handoff" if can_handoff(state, target) else "escalate")
Not automatically. They can improve specialization, but routing errors and context loss may reduce total accuracy.
Use structured messages for control fields and evidence references. Natural-language summaries can supplement the contract.
Use one when subtasks must be delegated, tracked, and combined under a single owner.
Explore 500+ free tutorials across 20+ languages and frameworks.