AI/TLDR

Types of Memory in AI Agents

Understand the four memory types that let agents remember facts, past conversations, and learned skills — and how each one is stored and retrieved.

INTERMEDIATE10 MIN READUPDATED 2026-06-12

In plain English

A plain LLM has no memory at all — each API call starts from a blank slate. An AI agent changes that. Agents can remember things: the conversation so far, facts they looked up last week, how a particular workflow goes, even personal details a user shared months ago. But not all of that "memory" works the same way.

Researchers and engineers have settled on four types, borrowed loosely from cognitive science. Think of a very capable new employee on their first week: they have their working notes open in front of them (in-context / working memory), a diary of what happened in past meetings (episodic memory), a company wiki they can search any time (semantic memory), and the muscle memory of skills they learned on the job (procedural memory). Each type stores different information, decays differently, and retrieves differently.

Why it matters

Memory is what separates a stateless chatbot from an agent that actually learns. Without long-term memory, every conversation starts cold — the agent can't remember that you prefer concise answers, that a particular API key lives in your vault, or that the last deployment failed for a specific reason. With the right memory type in the right slot, agents become genuinely useful over time.

What breaks without the right memory type

  • No episodic memory → agent asks for the same context every session, exhausts users, can't refer back to past conversations.
  • No semantic memory → agent hallucinates facts it should have looked up and stored, or re-fetches the same documents on every call.
  • No procedural memory → agent re-derives how to call a tool or format an output on every run instead of reusing a proven pattern.
  • Overloaded in-context memorycontext window fills up, costs spike, and the agent starts dropping earlier information (the lost-in-the-middle problem).

Knowing which memory type fits each need is the core skill. Stuffing everything into the context window is the default mistake — it works for a demo, but fails at scale.

How the four types work

The four memory types differ along two axes: scope (session-only vs. persistent across sessions) and content (events vs. facts vs. skills). Here is how each one works mechanically.

In-context memory (working memory)

The context window is the agent's working memory — everything it can "see" right now: the system prompt, the conversation so far, tool outputs, and any data you have explicitly pasted in. It is fast (no retrieval step), always accurate (the agent reads it verbatim), but strictly bounded by the model's context window size and expensive to keep large. When the session ends, it is gone.

Episodic memory

Episodic memory is the agent's diary: a log of what happened, when, and with whom. It answers questions like "what did we discuss last Tuesday?" or "which approach did the agent try first on the code-review task?" Episodic memories are instance-specific — they capture a single occurrence in context, not a generalised fact.

Mechanically, an episodic store is usually a vector database or a key/value store with timestamps. At the end of a session the agent (or a background process) writes a compressed summary; at the start of the next session it retrieves the most relevant episodes via semantic search and injects them into the prompt. This is the same retrieval idea behind RAG, applied to the agent's own history rather than external documents.

Semantic memory

Semantic memory stores facts and knowledge that are largely timeless: definitions, entity relationships, user preferences, domain rules, product specs. Unlike episodic memory it is atemporal — it does not care when something happened, only what is currently true. A customer-service agent stores the product catalogue here; a coding agent stores language conventions and API signatures.

The implementation overlaps heavily with RAG: chunk your knowledge base, embed the chunks, store them in a vector database, and retrieve the most relevant chunks for each query. The conceptual difference is framing: RAG retrieves external documents, while semantic memory retrieves the agent's own accumulated knowledge — but the machinery is identical.

Procedural memory

Procedural memory encodes how to do things: repeatable workflows, tool-calling patterns, output formats, and learned heuristics. In humans this is the "muscle memory" that lets you type without looking at the keyboard. In agents it lives in several places at once: the system prompt (explicit instructions), few-shot examples baked into the prompt, and fine-tuned weights that have been adjusted on successful task traces.

Procedural memory is the hardest to update at runtime. Changing the system prompt is cheap; fine-tuning requires a new training run. Frameworks like Mem0 and LangMem handle procedural memory by storing successful tool-call sequences as reusable "procedures" that get injected into the prompt when a matching task is detected — no fine-tuning needed.

Implementing each memory type

Here is how the four types map to concrete implementation choices in a production agent.

Minimal Python sketch

The pattern below shows how a simple agent mixes in-context + episodic memory. It writes a compressed summary at the end of each session, and reads the last three summaries at the start of the next one.

simple_memory_agent.pypython
import json
from pathlib import Path
import anthropic

client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from env
MODEL = "claude-sonnet-4-5"
MEMORY_FILE = Path("episodic_memory.json")


def load_episodes(n: int = 3) -> list[dict]:
    """Return the last n episode summaries from disk."""
    if not MEMORY_FILE.exists():
        return []
    episodes = json.loads(MEMORY_FILE.read_text())
    return episodes[-n:]


def save_episode(session: list[dict]) -> None:
    """Ask the model to compress the session into one episodic summary."""
    transcript = "\n".join(
        f"{m['role'].upper()}: {m['content']}" for m in session
    )
    resp = client.messages.create(
        model=MODEL,
        max_tokens=256,
        system="Summarise this conversation in 2-3 sentences, "
               "capturing key facts and decisions made.",
        messages=[{"role": "user", "content": transcript}],
    )
    summary = resp.content[0].text
    episodes = json.loads(MEMORY_FILE.read_text()) if MEMORY_FILE.exists() else []
    episodes.append({"summary": summary})
    MEMORY_FILE.write_text(json.dumps(episodes, indent=2))


# -- Session start: load episodic context --------------------------------
past = load_episodes(n=3)
episodic_context = (
    "Relevant past sessions:\n"
    + "\n".join(f"- {e['summary']}" for e in past)
    if past
    else ""
)

system_prompt = f"""You are a helpful assistant.
{episodic_context}"""

# -- Active conversation (in-context memory) ----------------------------
history: list[dict] = []
while True:
    user_input = input("You: ")
    if user_input.lower() == "quit":
        break
    history.append({"role": "user", "content": user_input})
    resp = client.messages.create(
        model=MODEL,
        max_tokens=1024,
        system=system_prompt,
        messages=history,  # full session context passed each turn
    )
    reply = resp.content[0].text
    history.append({"role": "assistant", "content": reply})
    print(f"Agent: {reply}")

# -- Session end: write episodic summary --------------------------------
if history:
    save_episode(history)

Memory frameworks

Rather than building memory from scratch, most teams reach for a dedicated memory layer. Mem0 extracts and indexes all four memory types automatically from conversations — it decides what to store and where. Zep focuses on episodic and semantic memory for chat-based agents, adding entity graphs. LangMem (part of LangChain) provides procedural and semantic stores that integrate with LangGraph agents. All three expose a simple add / search API so your agent code stays clean.

Going deeper

Once the four types click, several harder questions open up — about when memory should update, how to handle contradictions, and where the research frontier is moving.

Memory write policies

Deciding when to write a memory is as important as knowing what type to write. Three common policies: eager (write after every turn — noisy, expensive), end-of-session (compress the whole session — misses mid-session insights), and triggered (write only when the agent detects something noteworthy — lowest noise, hardest to tune). Most production systems combine end-of-session summarisation for episodic memory with triggered writes for semantic facts.

Memory conflicts and staleness

Semantic memory ages. A fact that was true in January ("the rate limit is 60 RPM") may be wrong in June. Agents need a staleness policy: either a TTL on semantic entries, or a periodic re-verification pass that fetches the original source and checks it still holds. Episodic memory rarely conflicts (it's historical), but it can accumulate contradictions — user says they prefer metric units in session 3, imperial in session 7. A conflict-resolution pass (keep the most recent, flag the tension, or ask the user) prevents the agent from confidently asserting both.

The KV cache as implicit memory

Modern inference servers use a KV cache to avoid recomputing attention over repeated prompt prefixes. If your system prompt is stable across all users (e.g. a large product manual), caching it once means you only pay the full prefill cost once. This is not "memory" in the agent sense, but it is a latency and cost lever that pairs naturally with a large, stable semantic memory injected into the system prompt.

Multi-agent memory sharing

In a multi-agent system the question becomes: which memories are private to one agent and which are shared across the team? Orchestrators typically share semantic and procedural memory (common facts, shared workflows) but keep episodic memory private to each worker (their individual conversation history). Shared semantic stores act like a team wiki; conflicts between agents' semantic beliefs need a merging or voting strategy.

Research: single-shot learning and memory benchmarks

A 2025 paper on the Anatomy of Agentic Memory (arXiv 2602.19320) identified five properties an episodic memory system must have: persistence, explicit reasoning, single-shot learning (capturing info from one exposure), instance specificity, and contextual grounding. Most current systems satisfy the first and last but struggle with single-shot learning without fine-tuning. Memory benchmarks like MemGPT and LoCoMo are emerging as standard evaluation harnesses for long-horizon agent memory.

FAQ

What is the difference between in-context memory and long-term memory in AI agents?

In-context memory is everything in the agent's active context window during the current session — it disappears when the session ends. Long-term memory (episodic, semantic, or procedural) is stored outside the model and retrieved on demand, so it persists across sessions. In-context memory is always visible to the model; long-term memory must be retrieved and injected into the context before the model can use it.

What is episodic memory in AI agents?

Episodic memory is the agent's record of past events and conversations — what happened, when, and with whom. It is stored as compressed summaries or embeddings in a vector database, then retrieved when a new session starts. It gives the agent continuity across conversations, so it doesn't have to ask for the same context again and again.

How is semantic memory different from RAG?

Conceptually, semantic memory is the agent's own accumulated knowledge base (facts, user preferences, domain rules), while RAG retrieves from external documents. In practice the machinery is nearly identical: both embed and index text, then retrieve by similarity. The distinction is framing and ownership — RAG fetches external knowledge on demand, semantic memory maintains a curated, agent-owned knowledge store that grows over time.

What is procedural memory in AI agents?

Procedural memory encodes repeatable skills: how to call a particular API, which output format to use, how to handle a known error pattern. It lives in the system prompt (explicit instructions), in few-shot examples, or in fine-tuned model weights. Unlike episodic and semantic memory it is rarely updated at runtime — changing the system prompt is cheap, but fine-tuning requires a training run.

Which memory type should I implement first?

Start with in-context memory management — make sure you are not wasting tokens and that the most important information is always near the top of context. Then add episodic memory if your agent has repeat users who expect continuity. Semantic memory pays off once you have a stable knowledge base the agent needs to query. Procedural memory via fine-tuning is the most expensive and usually the last to justify.

Can I use a vector database for all four memory types?

You can use a vector database for episodic and semantic memory — both rely on similarity search over embeddings. In-context memory does not need a database at all (it is just text in the prompt). Procedural memory is often stored in the system prompt or as fine-tuned weights, though some frameworks store successful tool-call sequences in a vector DB and retrieve them when a similar task appears.

Further reading