Multi-Agent Systems and Handoffs: Routing, Specialists, Shared State, and Coordination

Know When Multiple Agents Are Justified

Multiple agents can help when different parts of a task require distinct instructions, tools, permissions, context, or ownership. Specialization is the reason to split; novelty is not.

A handoff transfers responsibility from one agent to another. A good handoff contains the goal, relevant evidence, completed work, open questions, constraints, and the reason the next specialist was selected.

Many systems described as multi-agent are better implemented as one agent plus deterministic tools or workflow nodes. Start with one agent and split only after evaluation shows a clear bottleneck.

Use specialists when roles need different permission boundaries, large domain-specific contexts, independent evaluation criteria, or organizational ownership. For example, a support agent may gather evidence while a finance agent proposes a refund under stricter permissions.

Do not create separate agents merely for persona names. If two roles use the same tools, context, policy, and success criteria, a single agent with explicit modes is usually easier to operate.

Choose a Coordination Pattern

A router sends each request to one specialist. A supervisor delegates subtasks and combines results. A pipeline passes work through fixed roles. Peer handoffs let specialists transfer control directly, but require stronger loop prevention.

Router: best for mutually exclusive task categories.
Supervisor: useful when one task requires several specialists.
Pipeline: best when specialist order is predictable.
Peer handoff: flexible, but harder to govern and debug.

Design a Handoff Contract

Do not forward the entire transcript by default. Transfer a compact structured package with the original goal, relevant state, evidence references, decisions already made, unresolved questions, and permissions available to the receiver.

The receiving agent should validate that the request fits its role. If not, it should reject or escalate rather than bounce the task indefinitely.

Prevent Circular Work and Context Loss

Track the current owner, handoff count, visited agents, and reason for each transfer. Set a maximum handoff budget and route unresolved loops to a coordinator or human.

Shared state needs ownership rules. Define which agent may update each field and how conflicting updates are resolved.

Evaluate the Whole Team

A specialist can perform well while the overall system fails because routing is wrong or handoff context is incomplete. Measure routing accuracy, handoff completeness, end-to-end success, duplicate work, total cost, latency, and escalation quality.

Use Handoffs to Transfer Responsibility Clearly

Multi-agent systems are useful when specialized agents have different instructions, tools, context windows, or success criteria. They are not useful when they merely add more model calls to a task one agent can solve. A handoff should transfer responsibility with a clear reason, a compact state summary, and a defined expectation for the receiving agent.

A good handoff contains the user goal, completed work, relevant evidence, open questions, risk flags, and requested output. It should not dump the entire conversation unless the receiving agent truly needs it. Clean handoff packets reduce confusion and make traces easier to review.

The orchestrator should remain accountable for the workflow. Specialist agents can recommend actions, but the system still needs global budgets, approval policy, tool permissions, and final response rules. Otherwise multiple agents can each behave locally well while the overall system loops or contradicts itself.

Introduce a specialist only when it has distinct tools, policy, or expertise.
Give every handoff a reason and expected result.
Pass summarized state with citations instead of raw conversation dumps.
Limit handoff depth and detect ping-pong loops.
Keep final user communication consistent even when specialists contributed.

Coordination Failure Modes

Multi-agent failures often look like social confusion: two agents both think the other owns a task, a specialist acts on stale state, the orchestrator ignores a risk flag, or agents repeat the same analysis. These are architecture problems, not personality problems. Fix them with contracts, state ownership, and routing rules.

Start with one agent and add specialization only after you can name the bottleneck. If the bottleneck is retrieval quality, add a better retriever before adding a research agent. If the bottleneck is policy complexity, add deterministic policy checks before adding a compliance agent.

Define which agent owns each state field.
Use structured specialist outputs.
Require risk flags to propagate back to the orchestrator.
Track handoff count and specialist latency.
Evaluate the complete workflow, not each agent in isolation.

When Multi-Agent Design Is Actually Worth It

Multi-agent design is justified when different parts of the workflow need meaningfully different instructions, tools, memory, policies, or evaluation criteria. A refund specialist, account-security specialist, and general support agent may deserve separation because they operate under different rules and risks.

If the agents only have different names but share the same tools and goal, the architecture may be unnecessary. Extra agents add cost, latency, coordination failure, and debugging complexity. Experts add agents to reduce complexity at boundaries, not to make a system feel more advanced.

A handoff should transfer responsibility clearly. The receiving agent needs the user goal, relevant evidence, completed work, risk flags, and expected output. It does not need every token from the previous conversation unless the full transcript is truly required.

Evaluate handoffs by trace quality. A reviewer should see why the handoff happened, what state moved, what the specialist returned, and how the orchestrator used it. Hidden agent conversations are hard to debug and hard to trust.

Use agents for real differences in tools, policy, or expertise.
Avoid multi-agent design as decoration.
Pass compact handoff packets with evidence and risk flags.
Limit handoff loops and preserve supervisor accountability.

Design a Handoff Packet

Design a handoff packet for one specialist agent. Include the user goal, completed steps, evidence, open questions, risk flags, and requested output. Keep it compact enough that the receiving agent can act without rereading the entire conversation.

Then test a failed handoff: missing evidence, wrong specialist, or conflicting recommendation. The supervisor should detect the problem and recover rather than letting agents bounce the task back and forth.

Keep handoffs structured and compact.
Test wrong-specialist and disagreement cases.
Measure handoff count and final outcome quality.

Prove the Handoff Earns Its Cost

A good handoff improves clarity and accountability; if it only adds more conversation, simplify back to one agent or one deterministic route.

Handoff Packet

Use multiple agents only when specialization, permissions, context isolation, or team ownership justifies the coordination cost. A manager pattern keeps one agent responsible for the final result and calls specialists as tools. A handoff transfers conversational control to a specialist. Choose the pattern from ownership, not from the number of personas in a diagram.

A handoff packet should contain the user goal, completed work, selected evidence, open questions, constraints, risk flags, requested output, and trace references. Filter irrelevant or sensitive history rather than forwarding the entire transcript. The receiving agent must validate the packet and may reject an unsupported or wrongly routed task.

Prevent routing loops with a hop limit, visited-agent set, explicit completion owner, and escalation state. Evaluate wrong-specialist routing, disagreement, missing evidence, duplicated work, and partial failure. More agents are useful only if end-to-end quality improves enough to justify extra latency, cost, and security boundaries.

Keep shared workflow state authoritative outside individual agent conversations. Agents may read a scoped view and propose updates, but the orchestrator validates versions and merges changes. This prevents two specialists from overwriting each other or treating an old handoff summary as the current task state.

Choose manager or handoff by final-answer ownership.
Send structured minimal context with source references.
Cap hops and detect repeated delegation.
Measure coordination overhead against single-agent quality.

Handoff Design Examples

Structured Specialist Handoff

The router creates a small handoff object instead of forwarding an uncontrolled transcript.

Structured Specialist Handoff

from dataclasses import dataclass, field

@dataclass
class Handoff:
    target: str
    goal: str
    evidence_ids: list[str] = field(default_factory=list)
    completed: list[str] = field(default_factory=list)
    open_questions: list[str] = field(default_factory=list)
    reason: str = ""

def route_ticket(ticket: dict) -> Handoff:
    if ticket["category"] == "billing":
        return Handoff(
            target="billing_agent",
            goal="Determine whether the invoice mismatch needs a credit.",
            evidence_ids=ticket["evidence_ids"],
            completed=["ticket classified"],
            open_questions=["Does the purchase order match the invoiced quantity?"],
            reason="The request requires billing policy and finance tools.",
        )

    return Handoff(target="general_support", goal=ticket["summary"])

print(route_ticket({
    "category": "billing",
    "summary": "Invoice mismatch",
    "evidence_ids": ["ticket-8", "invoice-22"],
}))

The receiver gets the goal and evidence references.
Completed work is not repeated.
The routing reason is available for tracing and evaluation.

Handoff Loop Guard

Visited-agent tracking prevents endless delegation.

Handoff Loop Guard

def can_handoff(state: dict, target: str) -> bool:
    if state["handoff_count"] >= 3:
        return False
    if target in state["visited_agents"]:
        return False
    return True

state = {
    "handoff_count": 1,
    "visited_agents": {"router", "billing_agent"},
}

target = "billing_agent"
print("handoff" if can_handoff(state, target) else "escalate")

Handoff count creates a hard budget.
Visited roles prevent circular transfers.
The failure path should preserve partial progress for a human reviewer.

Before you move on