LangGraph Deployment: Local Services, Workers, APIs, and Production Architecture

Choose the Right Execution Boundary

A LangGraph tutorial becomes a product only when you can run it reliably for real users. Deployment is where graph design meets infrastructure, persistence, permissions, and latency expectations.

The central deployment question is simple: where should graph execution live? Tiny workflows may fit inside a synchronous API request. Longer or more failure-prone runs usually belong in workers backed by durable checkpointing and status APIs.

This page covers deployment as architecture rather than just packaging commands.

Short graphs with small tool chains can run inside request-response APIs. Long-running research, coding, or human-review workflows should usually run asynchronously in workers so the user-facing API can return quickly and poll or stream progress.

That separation becomes essential once you introduce retries, persistence, or manual approval because the run lifetime no longer matches the HTTP request lifetime.

Inline execution for fast, bounded graphs
Worker execution for long, interruptible, or high-variance workflows
Status endpoints or streams for frontends to observe progress

Persistence Is Part of Deployment, Not an Optional Add-On

If you deploy checkpointed graphs, your persistence backend is part of the runtime contract. Capacity, latency, schema evolution, and backup strategy all matter because the graph depends on stored thread state to resume correctly.

Treat persistence outages like workflow outages. If the checkpointer is unhealthy, your pause-resume guarantees are unhealthy too.

Monitor checkpoint storage health.
Version important state changes.
Plan restore and migration procedures.

Operational Topology for Real Systems

A practical deployment often includes an API layer, a worker layer, persistence, a tracing stack, and a queue or scheduler for background runs. Some teams also add a review UI for interrupts and approval queues.

The graph itself is only one piece. Reliable LangGraph services depend on the surrounding infrastructure being designed for long-lived workflows.

API service receives requests and starts or resumes threads.
Worker service executes graph steps.
Checkpoint storage persists run state.
Tracing and metrics observe cost, latency, and failure.

Security, Secrets, and Tool Permissions

Deployment hardens the abstract safety ideas from earlier lessons. Tools should run with least privilege, secrets should live in managed configuration, and human review should protect high-risk actions.

Never deploy a graph that can mutate important systems without first deciding which nodes and tools are allowed to do so under which conditions.

Use least-privilege credentials.
Separate read-only and write-capable tool identities.
Redact secrets from logs and traces.
Gate risky actions with review or policy checks.

Readiness Checklist Before Go-Live

A production-ready graph is not just one that returns good answers in staging. It is one that has limits, observability, fallback behavior, permissions, and a clear incident path when something fails.

This checklist thinking is what separates a clever workflow from a service a team can own confidently.

Loop and retry ceilings
Checkpoint durability
Structured tracing
Tool validation
Resume and interrupt handling
Load and failure testing

Deployment Topology for Stateful Graphs

A LangGraph deployment needs more than an API server. Serious applications usually have an API layer for requests and status, workers for graph execution, checkpoint storage for durable state, a queue or scheduler for background work, tracing for observability, and a review surface for human interrupts. Each piece has a different scaling and reliability profile.

Synchronous API execution is acceptable for tiny graphs that finish quickly and have no long external waits. Once a graph can call tools, pause for approval, retry after failures, or stream progress over time, worker-based execution becomes safer. Workers can be restarted, scaled, and monitored without tying user requests to long HTTP lifetimes.

Checkpoint storage is part of the deployment contract. If a run can resume, the checkpointer must be durable, backed up, access-controlled, and compatible with state schema changes. Treat it like application data, not a cache.

Deployment should also include status APIs. Users and operators need to know whether a run is queued, running, paused, waiting for approval, completed, failed, or cancelled. A graph that executes correctly but cannot report status feels broken in production.

Separate request handling from long-running graph execution.
Use durable checkpoint storage for resumable workflows.
Expose status, cancellation, and result endpoints.
Add tracing and structured logs from the first deployment.
Design review UI for interrupts before launching approval workflows.

Versioning and Backward Compatibility

Persisted graph threads create a compatibility problem. A run may start under one graph version and resume after new code is deployed. That means state schema, node names, route labels, and reducer behavior need migration discipline. Changing them casually can break old checkpoints.

Use version metadata in the state or run record. Store graph version, prompt version, model route, tool version, and state schema version. When a worker resumes a checkpoint, it can decide whether to migrate, continue, or fail safely with a clear operator message.

Rollouts should be staged. Send a small percentage of new runs to the new graph version while existing checkpointed runs continue safely. For high-risk changes, keep old workers available until older runs drain or migrate.

Deployment tests should include resumed runs, not only fresh invocations. Create checkpoints from an older version and verify the new deployment handles them. This catches the class of bugs that only appears after users pause, approve, and resume.

Version graph code and state schema explicitly.
Test resuming old checkpoints after deploys.
Avoid renaming state fields or nodes without migration plans.
Canary new graph versions before full rollout.
Keep rollback plans compatible with in-flight runs.

Deployment Readiness Exercise

Draw the production topology for one graph. Include API, workers, queue, checkpointer, trace store, review UI, secrets, external tools, and monitoring. If the graph can pause or run longer than a request, the topology should show where that state lives.

Then run a compatibility review. What happens to an old checkpoint after a new graph version deploys? Which state fields are stable? Which nodes were renamed? Which migrations are required? These questions prevent resume-time surprises.

Finish by defining operational actions: cancel a run, retry a failed run, resume an approval, disable a node, roll back a graph version, and inspect a trace. Production deployment is ready when these actions are ordinary, not emergency improvisation.

Also document data ownership for checkpoints, traces, and review records. Stateful graph deployment includes privacy and retention decisions, not only compute and scaling decisions.

Map every runtime component and dependency.
Test old checkpoints against new code.
Define cancellation, retry, and rollback operations.
Monitor queue depth, run duration, errors, and approval waits.

Runtime Topology

Deploy the graph separately from request ingress when runs can outlive an HTTP request. An API authenticates and creates or resumes a thread, workers execute graph steps, persistence stores checkpoints, a queue schedules work, and an event channel streams status. Scale workers only after side effects, thread locking, and checkpoint writes behave correctly under concurrency.

Use a production checkpointer and Store with tenant isolation, backups, encryption, retention, and monitored connection pools. Validate environment configuration at startup, expose health separately from dependency readiness, and keep model or tool credentials out of state. Local in-memory persistence is for tests and experiments, not multi-worker recovery.

Release progressively with a production build, migration checks, representative evaluations, and paused-thread compatibility tests. Monitor queue age, node latency, retry rate, checkpoint failures, interrupt backlog, tool errors, model usage, and task outcomes. Roll back or disable one graph capability when thresholds fail.

Separate short request handling from durable graph execution.
Protect thread concurrency and checkpoint consistency.
Use production persistence with isolation and recovery controls.
Deploy with evaluation and resume-compatibility evidence.

Deployment Topology Examples

Beginner Example: Inline API-Friendly Graph

A tiny graph can be invoked directly inside an API handler when it is fast and bounded.

Beginner Example: Inline API-Friendly Graph

def handle_request(question: str):
    result = graph.invoke({"question": question, "answer": ""})
    return {"answer": result["answer"]}

This is fine only when the graph is predictably fast and does not need long-lived persistence or interrupts.
It keeps architecture simple for small features.
Move beyond this pattern once runtime variability grows.

Intermediate Example: Start a Worker-Backed Thread

Longer workflows usually need an API to create a run and a worker to execute it asynchronously.

Intermediate Example: Start a Worker-Backed Thread

job = {
    "thread_id": "research-501",
    "graph_name": "research_assistant",
    "input": {"question": "Summarize the policy changes"},
    "status": "queued",
}

A queued run decouples the user request from graph lifetime.
The frontend can poll or subscribe to status changes.
This is a natural fit for persisted, interruptible workflows.

Advanced Example: Production Service Topology

This architecture keeps user APIs, graph execution, persistence, and review operations separated cleanly.

Advanced Example: Production Service Topology

Client UI
  -> API Gateway
      -> App Service (create/resume thread)
      -> Worker Queue
          -> Graph Worker
              -> Checkpoint Store
              -> Tool Services / Databases
              -> Tracing + Metrics
      -> Review UI for human approvals

The graph worker can scale independently from the public API.
Checkpoint and tracing infrastructure become first-class runtime dependencies.
This topology is suitable for long-running or enterprise workflows.

Before you move on