A LangGraph tutorial becomes a product only when you can run it reliably for real users. Deployment is where graph design meets infrastructure, persistence, permissions, and latency expectations.
The central deployment question is simple: where should graph execution live? Tiny workflows may fit inside a synchronous API request. Longer or more failure-prone runs usually belong in workers backed by durable checkpointing and status APIs.
This page covers deployment as architecture rather than just packaging commands.
Short graphs with small tool chains can run inside request-response APIs. Long-running research, coding, or human-review workflows should usually run asynchronously in workers so the user-facing API can return quickly and poll or stream progress.
That separation becomes essential once you introduce retries, persistence, or manual approval because the run lifetime no longer matches the HTTP request lifetime.
If you deploy checkpointed graphs, your persistence backend is part of the runtime contract. Capacity, latency, schema evolution, and backup strategy all matter because the graph depends on stored thread state to resume correctly.
Treat persistence outages like workflow outages. If the checkpointer is unhealthy, your pause-resume guarantees are unhealthy too.
A practical deployment often includes an API layer, a worker layer, persistence, a tracing stack, and a queue or scheduler for background runs. Some teams also add a review UI for interrupts and approval queues.
The graph itself is only one piece. Reliable LangGraph services depend on the surrounding infrastructure being designed for long-lived workflows.
Deployment hardens the abstract safety ideas from earlier lessons. Tools should run with least privilege, secrets should live in managed configuration, and human review should protect high-risk actions.
Never deploy a graph that can mutate important systems without first deciding which nodes and tools are allowed to do so under which conditions.
A production-ready graph is not just one that returns good answers in staging. It is one that has limits, observability, fallback behavior, permissions, and a clear incident path when something fails.
This checklist thinking is what separates a clever workflow from a service a team can own confidently.
A LangGraph deployment needs more than an API server. Serious applications usually have an API layer for requests and status, workers for graph execution, checkpoint storage for durable state, a queue or scheduler for background work, tracing for observability, and a review surface for human interrupts. Each piece has a different scaling and reliability profile.
Synchronous API execution is acceptable for tiny graphs that finish quickly and have no long external waits. Once a graph can call tools, pause for approval, retry after failures, or stream progress over time, worker-based execution becomes safer. Workers can be restarted, scaled, and monitored without tying user requests to long HTTP lifetimes.
Checkpoint storage is part of the deployment contract. If a run can resume, the checkpointer must be durable, backed up, access-controlled, and compatible with state schema changes. Treat it like application data, not a cache.
Deployment should also include status APIs. Users and operators need to know whether a run is queued, running, paused, waiting for approval, completed, failed, or cancelled. A graph that executes correctly but cannot report status feels broken in production.
Persisted graph threads create a compatibility problem. A run may start under one graph version and resume after new code is deployed. That means state schema, node names, route labels, and reducer behavior need migration discipline. Changing them casually can break old checkpoints.
Use version metadata in the state or run record. Store graph version, prompt version, model route, tool version, and state schema version. When a worker resumes a checkpoint, it can decide whether to migrate, continue, or fail safely with a clear operator message.
Rollouts should be staged. Send a small percentage of new runs to the new graph version while existing checkpointed runs continue safely. For high-risk changes, keep old workers available until older runs drain or migrate.
Deployment tests should include resumed runs, not only fresh invocations. Create checkpoints from an older version and verify the new deployment handles them. This catches the class of bugs that only appears after users pause, approve, and resume.
Draw the production topology for one graph. Include API, workers, queue, checkpointer, trace store, review UI, secrets, external tools, and monitoring. If the graph can pause or run longer than a request, the topology should show where that state lives.
Then run a compatibility review. What happens to an old checkpoint after a new graph version deploys? Which state fields are stable? Which nodes were renamed? Which migrations are required? These questions prevent resume-time surprises.
Finish by defining operational actions: cancel a run, retry a failed run, resume an approval, disable a node, roll back a graph version, and inspect a trace. Production deployment is ready when these actions are ordinary, not emergency improvisation.
Also document data ownership for checkpoints, traces, and review records. Stateful graph deployment includes privacy and retention decisions, not only compute and scaling decisions.
A tiny graph can be invoked directly inside an API handler when it is fast and bounded.
def handle_request(question: str):
result = graph.invoke({"question": question, "answer": ""})
return {"answer": result["answer"]}
Longer workflows usually need an API to create a run and a worker to execute it asynchronously.
job = {
"thread_id": "research-501",
"graph_name": "research_assistant",
"input": {"question": "Summarize the policy changes"},
"status": "queued",
}
This architecture keeps user APIs, graph execution, persistence, and review operations separated cleanly.
Client UI
-> API Gateway
-> App Service (create/resume thread)
-> Worker Queue
-> Graph Worker
-> Checkpoint Store
-> Tool Services / Databases
-> Tracing + Metrics
-> Review UI for human approvals
Yes, for short deterministic runs. Longer or interruptible workflows usually benefit from worker-based execution.
Treating a persisted graph like a stateless request handler and underestimating the role of checkpoint infrastructure.
Only if your workflow includes interrupts or approvals, but when it does, that UI becomes a key operational component.
Explore 500+ free tutorials across 20+ languages and frameworks.