AI/TLDR

How to Give an Agent Long-Term Memory (Vectors, Files, Graphs)

Compare the three dominant approaches to persistent agent memory and learn which storage shape fits which kind of recall.

INTERMEDIATE10 MIN READUPDATED 2026-06-12

In Plain English

Every time you start a new conversation with an LLM-powered agent, the model's context window is empty. Previous chats, user preferences, facts the agent learned last week — all gone. Long-term memory is the set of techniques that let an agent reach outside its context window and retrieve facts that survived past the end of a session.

Think of a human assistant. When you call them a month later, they don't re-read the entire filing cabinet before answering — they remember key facts, look up a relevant document, or connect the dots in their mental model of you. Agent memory systems try to replicate each of those three behaviors: semantic recall from a vector store, exact recall from files, and relational recall from a knowledge graph.

Why It Matters

Without persistent memory, every agent interaction starts from scratch. A customer-support bot forgets the issue the user described yesterday. A coding assistant re-learns your project conventions each session. A personal assistant asks for your name every time. This is not just annoying — it breaks tasks that inherently span multiple sessions, like long-running research projects or multi-day workflows.

Persistent memory solves three distinct problems that builders encounter:

  • Personalization — remember user preferences, past decisions, and communication style so the agent behaves consistently over time.
  • Knowledge accumulation — let the agent build up domain knowledge from its own observations rather than relying entirely on training data.
  • Task continuity — resume long-running tasks across restarts without re-reading every prior message.

The choice of memory backend shapes every downstream property of your agent: retrieval speed, update cost, reasoning depth, and operational complexity. Getting this choice right early saves significant refactoring later.

How It Works

All three approaches share the same basic loop: write something to memory when it is worth keeping, read back relevant pieces before generating a response, and maintain the store over time (updating stale facts, pruning irrelevant ones). The approaches differ in what shape they impose on the stored information.

Vector Stores

A vector store converts text into a numerical embedding — a list of hundreds or thousands of floating-point numbers that encodes the meaning of the text. When the agent needs to recall something, it embeds the current query and finds stored memories whose embeddings are close by cosine similarity. This is semantic search: you retrieve items that mean something similar, even if they use different words.

Popular managed vector databases include Pinecone, Weaviate, and Chroma. All three reached production-grade status by 2025-2026. Pinecone's serverless tier delivers sub-33ms p99 latency; Chroma Cloud went GA in late 2025 with collection forking; Weaviate added autonomous AI Agent operations in its 1.35 release. Alternatively, pgvector turns a PostgreSQL database you already run into a vector store without adding a new service.

File-Based Memory

The simplest form of persistent memory is a plain file — a JSON document, a Markdown scratchpad, or a set of structured text files. The agent reads the file at the start of a task and appends or rewrites sections as it learns new information. No embeddings, no external service, no query language required.

This pattern emerged prominently from Manus, a long-running agent system that used a three-file layout: task_plan.md for goals and progress, notes.md for research findings, and a separate output file. The competitive edge was not the technology — it was the discipline of structured, human-readable memory that stayed debuggable at every step.

Knowledge Graphs

A knowledge graph stores memory as a network of entities (people, projects, concepts) connected by typed relationships (works-at, depends-on, contradicts). Instead of asking "find me something semantically similar to this query", the agent can ask "give me all facts about entity X and its first-degree neighbors". This is structural recall — the graph encodes who relates to whom rather than raw text meaning.

Graphiti by Zep AI is an open-source, real-time knowledge graph engine built on Neo4j that gained significant traction in 2025. It uses a bi-temporal model — every graph edge stores both when the event occurred and when it was ingested — so you can query the state of the agent's knowledge at any past point in time. Graphiti's hybrid retrieval (semantic embeddings + BM25 keyword search + graph traversal) achieves P95 latency under 300ms without calling an LLM during retrieval.

Comparing the Three Approaches

No single storage shape is best for all workloads. The table below summarizes the key trade-offs:

ApproachBest forStrengthsWeaknesses
Vector storeSemantic fuzzy recallFast similarity search, scales to millions of memories, works across any topicNo explicit relationships, can hallucinate connections, metadata filtering needed for precision
FilesSimple structured facts, prototypesZero infrastructure, human-readable, easy to debug and version-controlNo similarity search, concurrent writes can corrupt, hard to query at scale
Knowledge graphRelational facts, entity trackingRich relational queries, temporal versioning, lower hallucination on entity factsCold-start cost, higher setup complexity, updates can be expensive

Implementation Patterns

The following patterns show how to wire each storage type into an agent. All three examples follow the same contract: a remember(text) function writes to the store, and a recall(query) function retrieves relevant context before each LLM call.

Vector Store with Mem0

Mem0 is an open-source Python/Node library that wraps a vector store with an LLM-driven extraction layer. On each turn it asks a small LLM to decide what facts are worth storing (ADD, UPDATE, DELETE, or NOOP), then writes only the distilled facts — not the raw transcript — so the store stays compact and queryable.

Basic Mem0 memory add + searchpython
from mem0 import Memory

# By default uses OpenAI embeddings + a local vector store
memory = Memory()
user_id = "user_alice"

# Write: agent learns something about the user
memory.add(
    "Alice prefers Python and dislikes verbose frameworks.",
    user_id=user_id
)

# Read: retrieve relevant facts before generating a response
results = memory.search(query="What stack should I suggest?", user_id=user_id)
for r in results:
    print(r["memory"])  # "Alice prefers Python and dislikes verbose frameworks."

File-Based Scratchpad

A flat JSON file is sufficient for a single-user agent that needs to persist a few dozen facts. The agent reads the full file into context at the start of each session, appends new facts during the session, and rewrites the file on exit.

Simple JSON file memorypython
import json
from pathlib import Path

MEMORY_FILE = Path("agent_memory.json")

def load_memory() -> list[str]:
    if MEMORY_FILE.exists():
        return json.loads(MEMORY_FILE.read_text())
    return []

def save_memory(facts: list[str]) -> None:
    MEMORY_FILE.write_text(json.dumps(facts, indent=2))

def recall_all(facts: list[str]) -> str:
    """Inject all facts as a system block — works fine up to ~50 facts."""
    return "\n".join(f"- {f}" for f in facts)

facts = load_memory()
facts.append("Deployment target is AWS us-east-1.")
save_memory(facts)

Knowledge Graph with Graphiti

Graphiti stores each memory as an episode (a raw text fragment) and automatically extracts entities and relationships, storing them in Neo4j. Retrieval is a hybrid of BM25, semantic, and graph-traversal search.

Graphiti add episode + searchpython
import asyncio
from graphiti_core import Graphiti

async def main():
    client = Graphiti("bolt://localhost:7687", "neo4j", "password")
    await client.build_indices_and_constraints()

    # Write: store a new memory episode
    await client.add_episode(
        name="onboarding-note",
        episode_body="Alice joined the platform on 2026-01-15 and owns project Orion.",
        source_description="HR system",
    )

    # Read: hybrid search returns relevant edges from the graph
    results = await client.search("What project does Alice own?")
    for edge in results:
        print(edge.fact)  # e.g. "Alice owns project Orion."

asyncio.run(main())

Going Deeper

Real production systems rarely use just one memory backend. They combine approaches in a hybrid architecture: a file or small key-value store for high-frequency structured facts (user name, preferences), a vector store for episodic semantic recall, and a knowledge graph for relational reasoning when the domain has rich entity structure. The routing logic — deciding which store to query for which type of question — is itself an agent decision.

Memory Maintenance

Memory without maintenance degrades. A vector store grows unboundedly, retrieval precision falls as noise accumulates, and stale facts mislead the agent. Production systems need a maintenance loop: periodic consolidation (merge redundant facts), forgetting (delete facts past a TTL or below a relevance threshold), and conflict resolution (decide which version of a fact is current). Mem0 runs this as an LLM-driven ADD/UPDATE/DELETE classification on every write. Graphiti's bi-temporal model handles conflict by tracking validity intervals rather than overwriting edges.

Multi-User and Multi-Agent Isolation

File-based memory breaks under concurrency — two agents writing to the same file can silently corrupt each other's data. Vector stores and graph databases handle concurrent writes safely, but you need namespacing: partition memories by user_id, agent_id, and session_id so one agent cannot accidentally retrieve another user's facts. Most managed vector databases (Pinecone, Weaviate, Chroma) support metadata-based filtering that doubles as a namespace boundary.

Temporal Memory

Many facts are only true for a window of time. A user's job title changes; a project's status flips. A naive vector store treats all memories as equally current, which causes the agent to confidently state outdated facts. Temporal models like Graphiti's bi-temporal graph, or explicit valid_until metadata fields in a vector store, let you query the current state of a fact rather than returning every version ever stored.

Benchmarks and Research Frontiers

Research published in 2025-2026 shows knowledge graph augmentation producing 54% higher accuracy than standalone LLMs on entity-heavy benchmarks, with hallucination rates reduced by over 40% on relational questions. Vector-only memory still leads on open-domain semantic recall. The frontier is graph + vector hybrid retrieval — systems like Graphiti that combine BM25 keyword scoring, semantic embeddings, and graph traversal in a single query pass. Microsoft's GraphRAG and Zep's Graphiti are the most widely deployed implementations as of mid-2026.

FAQ

How do I add memory to an AI agent without a vector database?

Use a plain JSON or Markdown file. The agent reads the full file at the start of each session and appends new facts as it learns them. This works well for prototypes and single-user agents with fewer than a few dozen facts. When you need semantic search across many memories, migrate to a vector store like Chroma or pgvector.

What is the difference between in-context memory and long-term memory in agents?

In-context memory is the conversation history sitting inside the current prompt — it disappears when the session ends. Long-term memory is stored outside the model (in a file, vector database, or knowledge graph) and can be retrieved in future sessions. Long-term memory lets agents accumulate knowledge across restarts and multiple users.

Which vector database is best for agent long-term memory?

For getting started quickly, Chroma is easy to run locally with zero infrastructure. For production at scale, Pinecone (managed, fast serverless tier) and Weaviate (open-source, hybrid search) are the most widely deployed choices as of 2026. If you already run PostgreSQL, the pgvector extension avoids adding a new service entirely.

When should I use a knowledge graph instead of a vector store for agent memory?

Use a knowledge graph when your agent needs to reason about relationships between entities — for example, "who owns which project" or "which version of a library is this service pinned to". Vector stores excel at fuzzy semantic recall but cannot natively express typed relationships. Graphiti (built on Neo4j) is the leading open-source choice for knowledge-graph agent memory.

How do I prevent an agent's long-term memory from growing stale or contradictory?

Add a maintenance step that runs on every write or on a schedule. Mem0 asks an LLM to classify each new fact as ADD, UPDATE, DELETE, or NOOP before storing it, automatically resolving conflicts. For knowledge graphs, the bi-temporal model in Graphiti tracks validity intervals so queries always return the current state of a fact.

Can multiple agents or users safely share the same long-term memory store?

Yes, but you need namespace isolation. Most vector databases support metadata filtering — store a user_id or agent_id field alongside every memory and filter on it at query time. Plain files are unsafe for concurrent access; switch to a database backend as soon as more than one writer is involved.

Further reading