AI Agent Memory and State: Short-Term Context, Long-Term Memory and Checkpoints

Types of Agent Memory

State is what the agent knows right now about the current task. Memory is what the system chooses to keep beyond the current step or session. Beginners often mix these ideas, but they solve different problems.

Imagine cooking with a recipe. The current pan temperature and chopped ingredients are state. Your long-term knowledge that a customer is allergic to peanuts is memory. Both matter, but they should be stored and protected differently.

Good memory makes agents more useful. Bad memory makes agents creepy, wrong, expensive, and unsafe. Production systems need retention rules, user consent, editing controls, and clear boundaries.

Short-term state includes the current goal, messages, tool results, plan, intermediate files, and step count. It helps the agent continue a task without losing track.

Long-term memory stores information across sessions, such as user preferences, approved project facts, team conventions, and past outcomes. This memory should be curated rather than blindly saving every conversation.

Conversation context: recent messages included in the prompt.
Task state: plan, observations, current step, and completion status.
Checkpoint: a saved snapshot that allows resume after failure.
Semantic memory: searchable facts or notes stored in a vector index or database.
Procedural memory: reusable instructions, habits, or policies.

Why Checkpoints Matter

Long-running agents can fail halfway through a task. A network request may timeout, a deployment may take minutes, or a human approval may arrive later. Checkpoints let the agent resume from a known state instead of starting over.

A checkpoint should include enough information to continue safely, but not unnecessary secrets. If a tool result contains sensitive data, store references or redacted summaries where possible.

Use checkpoints for multi-step tasks.
Save step count and last successful action.
Store idempotency keys for write operations.
Record pending approvals separately.

Text Diagram: Memory Layers

A mature system separates what changes every step from what survives across sessions.

Prompt Window: recent conversation and selected context
Runtime State: current plan, observations, step count
Checkpoint Store: resumable task snapshots
User Memory Store: preferences and approved facts
Knowledge Store: documents, policies, tickets, source code

Memory Safety

Memory should not become a hidden surveillance system. Users should know what is remembered, why it is remembered, and how to correct it. Enterprise systems also need access controls, data classification, deletion policies, and audit logs.

Another risk is memory poisoning. If an attacker convinces an agent to remember a malicious instruction, future sessions can be compromised. Treat stored memories as data, not trusted system instructions.

Do not store secrets as memory.
Distinguish facts from instructions.
Let users view, edit, and delete personal memory.
Expire low-value memories.
Validate retrieved memories before high-risk actions.

Memory Governance, Expiry, and Deletion

Memory creates product value and data responsibility at the same time. Every stored fact should have a purpose, scope, owner, retention period, and deletion path.

Do not let model-generated summaries become permanent truth automatically. Record provenance and confidence, let users correct important memories, and expire facts that become stale.

Classify memories as thread state, user preference, durable fact, or derived summary.
Store source and timestamp metadata.
Support correction, deletion, and tenant-wide retention policies.
Do not store secrets or sensitive attributes unless the product explicitly requires and protects them.

Separate Run State, User Memory, and Knowledge

State and memory are often confused. Run state is the temporary working memory of one task: current goal, messages, tool results, selected plan, retry count, pending approval, and final answer. User memory is information intended to help future interactions, such as preferences or durable facts. Knowledge is external source material retrieved with permission.

Keeping these categories separate prevents serious product mistakes. A tool observation from one run should not automatically become long-term memory. A user preference should not be treated as verified enterprise knowledge. A retrieved document should not become a permanent user fact unless a trusted workflow explicitly stores it.

Run state should be structured for execution and debugging. Store the fields needed to resume, inspect, and evaluate the task. Long conversation transcripts should be summarized or windowed, but important decisions, approvals, tool outputs, and citations should remain recoverable.

Long-term memory needs governance. Every memory should have scope, source, confidence, timestamp, owner, retention rule, and deletion path. Users should be able to correct important memories because model-generated summaries can be wrong, outdated, or overly broad.

Use run state for current execution.
Use memory for reusable preferences or facts.
Use RAG for authoritative external knowledge.
Never promote temporary observations into durable memory automatically.
Track provenance, confidence, and expiry for stored memories.

Memory Retrieval and Update Rules

Memory should be retrieved selectively. Loading every stored fact into every prompt increases cost, creates privacy risk, and can confuse the model with stale context. Retrieve memories based on task relevance, recency, confidence, and permission. The best memory systems are quiet until the memory genuinely helps.

Memory updates should be conservative. Store explicit user preferences, repeated stable behavior, or facts confirmed by trusted systems. Do not store guesses, sensitive attributes, secrets, temporary moods, or facts extracted from untrusted documents unless the product has a clear reason and consent model.

Conflict handling matters. If a new memory contradicts an old one, the system should merge, expire, ask clarification, or keep both with timestamps rather than blindly appending. Otherwise the agent will accumulate contradictory beliefs and appear unpredictable.

Evaluation should include memory behavior. Test whether the agent uses helpful memories, ignores irrelevant ones, avoids leaking memories across tenants, and supports correction. Memory is a product feature, not just a database table.

Retrieve memories by relevance and permission, not by volume.
Require explicit rules for what may be stored.
Support edit, delete, and expiry workflows.
Handle contradictory memories with timestamps and confidence.
Test privacy isolation and stale-memory behavior.

Memory Review Exercise

Review memory by asking what should be remembered, why it should be remembered, who can see it, and how it can be corrected. If a team cannot answer those questions for a stored fact, that fact probably should not become durable memory yet.

Create examples for four categories: temporary run state, user preference, verified durable fact, and retrieved knowledge. Then decide where each category lives, how long it lasts, and whether it can enter future prompts. This prevents the common mistake of treating every useful sentence as memory.

Finally, test deletion and correction. A memory system that can store facts but cannot remove or update them will eventually degrade user trust. Correction paths are not optional polish; they are part of responsible product design.

For production review, also verify that memory behavior is visible in traces. A future answer should show which memories were retrieved and why they were relevant.

Classify facts before storing them.
Avoid storing secrets, guesses, or temporary observations.
Test retrieval relevance and tenant isolation.
Support correction, expiry, and deletion workflows.

Memory Write Policy

Separate working state, conversation history, durable workflow checkpoints, user preferences, and retrieved knowledge. They have different owners, retention periods, correction paths, and security rules. A checkpoint records how to resume a run; it should not silently become a permanent user profile.

Every long-term memory write needs a reason, source, confidence, scope, expiry, and deletion path. Prefer explicit user preferences or verified application events over model-inferred personal facts. Deduplicate conflicting memories and keep provenance so a correction can replace the exact source instead of adding another contradictory sentence.

Memory retrieval is another untrusted input path. Filter by user and tenant before similarity search, limit results, exclude secrets, and defend against stored prompt injection. Evaluate whether memory improves the task and whether deletion actually removes it from indexes, caches, summaries, and backups under the product policy.

Name each state store by purpose and lifecycle.
Require provenance and scope for durable memory writes.
Do not infer sensitive preferences without a clear product need.
Test correction, expiry, deletion, and poisoning resistance.

Memory Policy Examples

Simple State Object for a Research Agent

from typing import TypedDict

class ResearchState(TypedDict):
    question: str
    plan: list[str]
    sources: list[str]
    draft_answer: str
    step_count: int
    needs_human_review: bool

state: ResearchState = {
    "question": "Compare vector databases for a support chatbot.",
    "plan": ["collect requirements", "compare options", "write recommendation"],
    "sources": [],
    "draft_answer": "",
    "step_count": 0,
    "needs_human_review": False,
}

print(state["plan"][0])

Typed state makes the agent easier to debug.
step_count helps enforce loop limits.
needs_human_review gives the runtime a clear pause signal.

Memory Record with Lifecycle Metadata

A durable memory should carry enough metadata for review, expiry, and deletion.

Memory Record with Lifecycle Metadata

from datetime import datetime, timedelta, timezone

memory = {
    "namespace": ("tenant-7", "user-42", "preferences"),
    "key": "response_style",
    "value": "concise",
    "source": "explicit_user_setting",
    "created_at": datetime.now(timezone.utc).isoformat(),
    "expires_at": (datetime.now(timezone.utc) + timedelta(days=180)).isoformat(),
    "confidence": 1.0,
}

print(memory["source"], memory["expires_at"])

The namespace prevents cross-user mixing.
Provenance distinguishes explicit settings from inferred summaries.
Expiry creates a review point instead of permanent accumulation.

Before you move on