In plain English
An AI agent starts every conversation with amnesia. Its only memory is the context window — the text you pass in on this call. Close the chat, open a new one tomorrow, and the agent has no idea who you are, what you told it, or what it learned. Each session starts from a blank slate.

Mem0 is a memory layer you bolt onto an agent so it stops forgetting. Instead of cramming an entire chat history into every prompt, Mem0 quietly watches the conversation, pulls out the durable facts worth keeping ("the user is vegetarian," "they work in Berlin," "they prefer short answers"), and saves them outside the model. Next time the user shows up — minutes or months later — Mem0 fetches the handful of relevant memories and hands them to the agent before it replies.
Think of a good barista at your regular café. The first week you spell out your order every time. After a month they see you walk in and start your oat-milk flat white before you reach the counter. They didn't memorize every word you ever said — they kept the few facts that matter and recall them at the right moment. Mem0 is that barista's notebook for an AI agent: a small, searchable store of what's worth remembering about you, separate from the conversation itself.
Why it matters
Without a memory layer, builders reach for two bad options, and both break down fast.
- Stuff the whole history into the prompt. Replay every past message on every call. This works for a short chat, but conversations grow without bound. Soon you're paying for thousands of tokens of old chatter on every single request, the call gets slow, and once the history outgrows the context window you have to start throwing messages away anyway. You're paying more to remember less reliably.
- Re-ask the user every session. Just let the agent forget and make the person repeat their preferences each time. That's the amnesiac-assistant experience nobody wants — it feels broken, and for anything personalized (a coaching app, a support bot, a personal assistant) it's a non-starter.
A dedicated memory layer fixes the root problem: it separates what the agent should remember from what's in the prompt right now. The full history lives in durable storage; only the few memories relevant to the current question get pulled into context. That keeps prompts small and cheap while the agent's effective memory grows without limit.
Who cares? Anyone building an agent a person comes back to. A support bot that should remember your account tier and past tickets. A tutor that tracks which concepts you've mastered. A personal assistant that learns you hate morning meetings. Mem0's own pitch is exactly this: persistent, personalized memory so the agent feels like it knows the user instead of meeting them for the first time, every time.
How it works
Mem0 runs an extract → store → retrieve loop. The key idea is that it does not save the conversation verbatim. It uses an LLM to distill messages into compact, standalone memories, stores them, and later searches them by meaning — so what comes back is a short list of relevant facts, not a transcript.
Writing a memory: extract, then reconcile
After a turn of conversation, Mem0 passes the recent messages to an LLM and asks it: what here is worth remembering? The model returns candidate memories as short statements — "User is allergic to peanuts," "User's project is named Apollo." Crucially, Mem0 then reconciles each candidate against what it already knows. A new fact gets added; a fact that refines an old one updates it; a fact that contradicts an old one can replace it; a duplicate is ignored. This is what stops the store from filling up with near-identical or stale entries.
Reading a memory: search by meaning
When a new message arrives, Mem0 turns it into an embedding and runs a semantic search over the stored memories, scoped to that user. It returns the top few most relevant memories, which your code injects into the prompt before calling the model. So the agent answers with the right background facts in hand — without ever seeing the whole history.
Under the hood Mem0 keeps memories in a vector store (for meaning-based search) and tracks metadata like the user it belongs to and when it was created. An optional graph layer can also record relationships between entities — "Alice works at Acme," "Acme is in Berlin" — so the agent can answer questions that hop across several connected facts, not just match a single one.
from mem0 import Memory
memory = Memory() # backed by a vector store + an LLM
USER = "alice"
# WRITE: hand Mem0 the conversation; it extracts + stores the durable facts.
memory.add(
[
{"role": "user", "content": "I'm vegetarian and I love spicy food."},
{"role": "assistant", "content": "Noted! I'll keep that in mind."},
],
user_id=USER,
)
# READ: later, in a brand-new session, fetch only the relevant memories.
hits = memory.search("suggest a place for dinner", user_id=USER)
context = "\n".join(m["memory"] for m in hits["results"])
prompt = f"What you know about the user:\n{context}\n\nThey asked: suggest a place for dinner"
# -> send `prompt` to your LLM; it now knows Alice is vegetarian + likes spicy food.That's the entire pattern: add() after a turn to remember, search() before a turn to recall. Everything else — chunking, embedding, reconciliation, storage — Mem0 handles for you.
Where it fits in the memory stack
Agents have several kinds of memory, and Mem0 is not all of them. It's easy to confuse the dedicated long-term layer with the model's built-in context. Here's how the pieces line up.
| Memory type | Lives where | Lasts | Mem0's role |
|---|---|---|---|
| Working memory | The context window of the current call | One call | Not Mem0 — see scratchpad / working memory |
| Short-term memory | Recent turns of this session | One session | Usually handled by your chat loop; Mem0 reads from it to extract |
| Long-term memory | External store (vectors, graph) | Across sessions, indefinitely | This is Mem0's job — see long-term memory |
If you want the bigger picture of how these layers relate, the types of agent memory and agent memory overview articles cover the taxonomy in depth. Mem0 is one concrete implementation of the long-term slot in that stack — and a closely related idea, context compaction, tackles the same overflow problem from the prompt side.
A memory layer vs just using a bigger context window
A fair question in 2026, when context windows hold hundreds of thousands of tokens: why not skip the memory layer and paste the whole history every time? Sometimes you can — for a single short-lived chat, that's the simplest thing that works. But it breaks down the same way naive long-context breaks down for retrieval.
- Stores facts outside the prompt
- Recalls only what's relevant now
- Prompt stays small + cheap
- Memory grows without limit
- Survives across sessions
- Replays the whole history each call
- Pays for every old token, every time
- Slows down as history grows
- Caps out when history > window
- Gone when the session ends
There's a quality angle too, not just cost. Models recall facts buried in a huge prompt less reliably than facts placed front and center — the "lost in the middle" effect. By surfacing only the few memories that matter and putting them right where the model will see them, a memory layer can make the agent more accurate, not just cheaper. The big window and the memory layer aren't enemies, though: you still use the window for the live conversation — Mem0 just keeps it from having to hold everything you've ever said.
Common pitfalls
A memory layer is easy to add and easy to misuse. Most trouble comes from what gets remembered and how it's scoped — not from the search itself.
- Remembering the wrong things. The extraction step is LLM-driven, so it can save trivia ("user said hi") or, worse, treat a one-off as a standing preference. Garbage in the memory store quietly poisons future answers. Review what gets stored, especially early on.
- Stale or contradictory memories. People change. If the reconciliation step doesn't update or retire old facts, the agent can confidently act on something that's no longer true. This is why the add/update/delete logic matters more than raw storage.
- Mixing up users. Every memory must be scoped to an identity (a
user_id, and often asessionoragentscope too). Forget the scope and one person's memories leak into another's reply — a privacy and correctness bug in one. - Treating memories as trusted commands. A retrieved memory is data, not an instruction. If a user once typed something adversarial that got stored, replaying it blindly is a path to prompt injection. Fence injected memories clearly as background facts.
- Storing data you shouldn't keep. Long-term memory can quietly accumulate sensitive personal data. Decide up front what's safe to persist, give users a way to view and delete their memories, and respect deletion requests.
Going deeper
The extract-store-retrieve loop above is the core, and the interesting engineering is in tuning each stage for your app. A few directions worth knowing once the basics click.
Hosted vs self-hosted. Mem0 ships as an open-source library you run yourself (you bring your own vector store, LLM, and optional graph database) and also as a managed platform that hosts the memory store for you. The library gives you full control and keeps data in your infrastructure; the platform trades that for less plumbing. The mental model is identical either way — only who runs the storage changes.
Memory scopes. Real systems rarely have just one bucket per user. You'll often separate user-level memory (durable facts about the person), session-level memory (what happened in this conversation), and agent-level memory (things the agent learned about itself or a task). Scoping retrieval to the right level keeps recall precise and prevents cross-contamination.
Graph memory and multi-hop recall. Plain vector memory matches one fact at a time, which struggles with questions that chain across several facts ("who on my team also knows the client I met last week?"). The graph layer stores entities and the relationships between them, so the agent can traverse connections instead of hoping a single embedding match covers it — the same trade as GraphRAG, more ingestion work for better multi-step reasoning.
Where it sits in an agent. Memory pairs naturally with the rest of the agent loop: the agent plans using recalled context, acts, then reflects and self-corrects — and what it learns can be written back as new memories. Done well, the agent doesn't just remember facts about the user; it remembers what worked and what didn't, and gets better over time.
The honest open challenges are real and unglamorous. Deciding what's worth remembering is a judgment call no metric fully captures. Keeping memories fresh as the world and the user change is an ongoing reconciliation problem. And every memory you keep is personal data you're now responsible for. The durable lesson: a memory layer is only as good as the facts it chooses to keep and retire — so most of your effort belongs in extraction and reconciliation, not in the search.
FAQ
What is Mem0 used for?
Mem0 is a memory layer that gives an AI agent persistent, personalized long-term memory. It extracts durable facts from conversations, stores them outside the model, and recalls the relevant ones later — so the agent remembers a user's preferences, history, and key facts across sessions instead of starting fresh every time.
How does Mem0 work?
It runs an extract-store-retrieve loop. After a conversation turn, an LLM picks out the facts worth keeping and reconciles them with what's already stored (add, update, delete, or skip). On a new message, Mem0 runs a semantic search over that user's memories and returns the most relevant ones to inject into the prompt.
Is Mem0 the same as RAG?
No, though they share the embed-and-search mechanism. RAG retrieves facts from your documents to answer a question. Mem0 remembers facts about a specific user and relationship across sessions. Many systems use both: RAG for the knowledge base, Mem0 for personalization.
Why not just use a bigger context window instead of a memory layer?
A big window lets you replay history for a short chat, but it doesn't scale: you pay for every old token on every call, prompts slow down, history eventually outgrows the window, and models recall buried facts less reliably. A memory layer keeps the prompt small by injecting only the few memories that matter and persists them across sessions.
Is Mem0 open source?
Mem0 is available as an open-source library you self-host (bringing your own vector store and LLM) and also as a managed hosted platform. The extract-store-retrieve model is the same either way; the difference is who runs the storage infrastructure.
Does Mem0 store the full conversation?
No. Instead of saving transcripts verbatim, Mem0 uses an LLM to distill messages into compact, standalone memory statements, then reconciles them against existing memories. That keeps the store small, relevant, and free of stale duplicates.