Tutorials Logic, IN info@tutorialslogic.com

LangGraph Memory and Checkpoints: Threads, Persistence, and Durable Runs

LangGraph Memory and Checkpoints

Memory in LangGraph is not one thing. There is the live state flowing through a single run, the persisted checkpoints that let that run pause and resume, and any longer-lived memory you store across many runs or threads.

Understanding those layers is essential because they solve different problems. Short-term state helps the graph keep context. Checkpoints help the runtime survive time, failure, and human pauses. Long-term memory helps the application remember beyond one thread.

This page will help you separate those concerns so your graphs remain durable without becoming bloated.

Short-Term State Versus Persistent Checkpoints

A graph state exists while a run is active. A checkpoint is a saved snapshot of that state at execution boundaries so the runtime can resume later. Official docs describe these snapshots as being organized under threads, which is why `thread_id` matters so much.

If you do not persist checkpoints, you can still run graphs. You just lose durable pause-resume behavior, time travel, and many human-in-the-loop patterns.

  • State is the current in-flight working memory.
  • Checkpoints are saved snapshots of that state.
  • Threads identify which conversation or workflow instance the runtime should load.

What Persistence Enables

Persistence turns graphs from request-bound pipelines into long-lived systems. A user can leave and return. A human reviewer can approve later. A worker can restart without forgetting the workflow position.

It also unlocks debugging capabilities such as inspecting state history or replaying from earlier checkpoints.

  • Human review and interrupts
  • Crash recovery
  • Conversation continuity
  • Time-travel style debugging

Choosing Between In-Memory and Durable Stores

In-memory persistence is excellent for local development and tests because it removes infrastructure overhead. It is not durable across process restarts. Production runs should use a persistent backend so checkpoints survive service restarts and distributed execution.

The practical rule is simple: if losing the process means losing business context, you need durable checkpoint storage.

  • In-memory saver for tutorials and tests
  • Database-backed checkpointing for real services
  • Separate local convenience from production guarantees

Long-Term Memory Is Broader Than Checkpoints

Checkpoint persistence is thread-scoped execution memory. Long-term memory is application memory that can outlive a thread and be reused later, such as customer preferences, prior decisions, semantic summaries, or retrieved knowledge.

Store those durable facts outside the graph state, then read them into state when a new run needs them. That keeps the graph state focused while still letting the application remember.

  • Use state for current run context.
  • Use long-term stores for reusable facts across runs.
  • Load only the memory needed for the current decision.

State Evolution and Backward Compatibility

Persisted threads create a schema evolution problem. If you change the state shape after deployment, older checkpoints may resume into newer graph code. That means your graph changes must be treated like a compatibility-sensitive API change.

Production teams should version important state changes, add migration logic when needed, and test resume behavior rather than assuming new code will fit old checkpoints cleanly.

  • Avoid casual renames of critical state fields.
  • Test resumed runs after schema changes.
  • Document which fields are safe to add, deprecate, or transform.

Checkpoints Are Execution History

A LangGraph checkpoint is more than saved memory. It is a durable snapshot of execution state that makes resume, replay, time travel, and debugging possible. Treat checkpoints as part of the application data model whenever users depend on long-running or interruptible workflows.

Thread identifiers should be stable, tenant-safe, and meaningful enough for operations. If the same user has multiple workflows, each should have a distinct thread or namespace strategy. Accidentally reusing thread IDs can mix state between tasks, which is both confusing and dangerous.

Checkpoint data needs retention rules. Some workflows need short-lived state; others need audit history. Sensitive state should be minimized, encrypted where appropriate, and deleted according to policy. Durable execution does not mean keeping everything forever.

Use time travel and replay for debugging, but understand side effects. Replaying a node that sends email or updates a ticket can be unsafe unless side effects are isolated, idempotent, or mocked during replay.

  • Treat checkpoints as durable workflow state.
  • Use safe thread ID strategies.
  • Define retention and deletion policies.
  • Protect sensitive state in checkpoint storage.
  • Isolate side effects from replay-sensitive nodes.

Memory Versus Checkpoint Persistence

Checkpoint persistence and user memory solve different problems. Checkpoints let a run continue. Memory helps future runs use relevant facts or preferences. Do not confuse a checkpoint with a long-term memory store. A checkpoint may contain temporary observations that should not influence unrelated future tasks.

When you add long-term memory to LangGraph, make memory retrieval an explicit node or task. That node should apply relevance, permission, recency, and confidence filters. It should also record which memories entered state so later debugging can explain the answer.

Memory updates should happen at controlled points, often after a successful run or explicit user confirmation. Storing memory mid-run can preserve false assumptions from incomplete work.

Schema evolution affects both checkpoints and memory. If state or memory structures change, write migration logic or compatibility adapters. Otherwise old runs and old memories will break under new code.

  • Use checkpoints for current run continuity.
  • Use memory stores for cross-run reusable facts.
  • Retrieve memory through explicit, traceable nodes.
  • Update memory conservatively after confirmation or success.
  • Plan migrations for state and memory schema changes.

Checkpoint Safety Exercise

Create a run, pause it, inspect the checkpoint, resume it, then replay from an earlier state. This hands-on exercise makes durable execution concrete. It also reveals whether state fields are understandable outside the code path that produced them.

Next, simulate a process crash after a node completes and before the user sees the result. The graph should resume or report failure without losing important state. If it repeats a side effect, the node boundary needs redesign.

Finally, review retention. Checkpoints may contain sensitive user input, retrieved context, and tool outputs. Decide what must be retained, what can expire, and what should be redacted.

  • Practice pause, resume, and replay.
  • Crash-test side-effect boundaries.
  • Inspect checkpoint readability.
  • Define retention and redaction rules.

Beginner Example: Persist a Simple Thread with InMemorySaver

This is the smallest persistence example worth learning: compile with a checkpointer and invoke using a `thread_id`.

Beginner Example: Persist a Simple Thread with InMemorySaver
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import InMemorySaver

class CounterState(TypedDict):
    count: int

def increment(state: CounterState) -> dict:
    return {"count": state["count"] + 1}

builder = StateGraph(CounterState)
builder.add_node("increment", increment)
builder.add_edge(START, "increment")
builder.add_edge("increment", END)

graph = builder.compile(checkpointer=InMemorySaver())
config = {"configurable": {"thread_id": "counter-1"}}
print(graph.invoke({"count": 0}, config=config))
  • Persistence is attached at compile time through the checkpointer.
  • The `thread_id` identifies which execution history to use.
  • This pattern is the foundation for conversation memory and interrupts.

Intermediate Example: Resume an Interrupted Review

Interrupts rely on persisted checkpoints so the graph can pause and later continue from the same thread.

Intermediate Example: Resume an Interrupted Review
from typing_extensions import TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import InMemorySaver
from langgraph.types import interrupt, Command

class ApprovalState(TypedDict):
    draft: str
    approved: bool

def human_gate(state: ApprovalState) -> dict:
    approved = interrupt("Approve this draft?")
    return {"approved": bool(approved)}
  • Without a checkpointer, the graph cannot reliably pause and resume.
  • Resuming uses the same thread identity.
  • The resumed value becomes the return value of `interrupt()` inside the node.

Advanced Example: Separate Long-Term Memory From Run State

Keep persistent application facts outside the graph state and inject only what the current run needs.

Advanced Example: Separate Long-Term Memory From Run State
class SupportState(TypedDict):
    customer_id: str
    preferences: dict
    latest_request: str
    reply: str

def load_customer_profile(state: SupportState) -> dict:
    # Pretend this reads from a durable store or database.
    profile = {"language": "en", "refund_tier": "gold"}
    return {"preferences": profile}
  • This is long-term memory usage, not checkpointing.
  • The graph pulls durable facts into state when needed.
  • That keeps thread state lean while still enabling personalization.
Key Takeaways
  • Differentiate run state, checkpoints, and long-term memory clearly.
  • Use `thread_id` consistently for persisted workflows.
  • Choose durable storage whenever the workflow must survive restarts.
  • Plan state-schema evolution before shipping checkpointed systems.
Common Mistakes to Avoid
Calling everything “memory” and losing track of what persists where.
Using in-memory persistence in workloads that require recovery after restarts.
Stuffing durable customer records directly into every checkpointed state snapshot.

Practice Tasks

  • Compile a graph with `InMemorySaver` and inspect how `thread_id` changes behavior.
  • Design a state shape that keeps current-run context separate from reusable customer data.
  • List two checkpoint schema changes that would need migration planning.

Frequently Asked Questions

No. Add it when you need persistence, replay, interrupts, or thread continuity.

No. Checkpointing persists execution state; long-term memory stores reusable facts beyond one run.

Treating persisted state changes as harmless code refactors when they can break resumed threads.

Ready to Level Up Your Skills?

Explore 500+ free tutorials across 20+ languages and frameworks.