In plain English
An AI coding tool — Cursor, GitHub Copilot, Claude Code, Gemini CLI — feels like a teammate who has read your whole project. But it hasn't. On every single request, the model only sees the text you (and the tool) hand it: some of your files, your question, maybe the last few messages. That bundle of text is the context window, and it has a fixed size. Nothing outside it exists, as far as the model is concerned.

Think of the context window as the agent's desk. It's a big desk, but still finite. To work on your bug, it spreads out the relevant files, your instructions, and its own notes. When the desk fills up, something has to come off — and the tool decides what, usually the oldest stuff. That is exactly why an agent "forgets" a file you opened twenty minutes ago, or starts contradicting a decision you made earlier: the paper carrying that information slid off the desk.
So when an agent drifts, repeats itself, or loses the plot in a long session, the cause is almost never "the model got dumber." The model is the same. What changed is what was on the desk at that moment. Manage the desk well and the same model behaves far better. That single idea is what this article is about.
Why it matters
If you build with AI coding tools, the context window is the single constraint behind most of your daily frustrations. Knowing it turns a string of mysterious annoyances into one understandable limit you can work with.
- "It forgot the file I told it about." The file was in context earlier, then got pushed out to make room. The agent isn't ignoring you — that text is simply no longer on its desk.
- "It keeps rewriting code we already fixed." The earlier fix scrolled out of the window, so the agent re-derives a solution from scratch — sometimes a worse one.
- "It contradicts itself across a long chat." Different turns saw different slices of context. With the earlier decision gone, the model has no record that it ever made one.
- "It got slower and more expensive the longer we worked." Every token in the window is re-read and paid for on every request. A bloated context is a slow, costly context.
There's a subtler reason too. Even when everything fits, models recall information unevenly across a long context — facts buried in the middle are recalled less reliably than facts near the start or end. This is often called the lost-in-the-middle effect. So a giant, stuffed context isn't just slow; it can be less accurate than a small, focused one. More context is not automatically better context.
The payoff of understanding this is direct: you stop blaming the tool (or yourself) and start steering. Most of the "the agent is bad at my codebase" feeling dissolves once you control what lands in the window.
How it works
Under the hood, a coding agent is a loop. Each turn, the tool assembles a context window, sends it to the model, gets back a response (often a tool call like read this file or edit that one), runs it, and folds the result back into the context for the next turn. The window is rebuilt every turn — and that is where it can quietly fill up.
What goes into the window
The context an agent sends is a stack of distinct pieces, all competing for the same fixed budget. When people say "manage context," they mean managing this stack.
How the tool finds the right code
Your repo is far too big to fit in the window, so the tool can't just send everything. Two techniques pick what to include. Indexing + retrieval: the tool pre-scans your codebase and builds a searchable index (often using embeddings, the same idea behind RAG), then pulls in the few files most relevant to your request. Agentic search: the agent uses tools like grep and file-read to go find the relevant code itself, the way you would. Most modern tools blend both.
What happens when it fills up
When a session runs long and the window approaches its limit, the tool has to make room. The common move is summarization (sometimes called compaction): the tool replaces a big chunk of older conversation with a short summary of it, freeing space while trying to keep the gist. This is genuinely useful — it's how an agent runs for hours — but it's lossy. The exact line you cared about may survive only as "refactored the auth module," and the specific detail is gone. That lossy hand-off is the moment drift most often creeps in.
Spotting and acting on the drift signal
The most useful skill is recognizing drift early — the moment an agent's grip on the task starts slipping because the context is degrading. Once you can name the signal, you know exactly when to intervene.
What drift looks like
- It asks for a file or fact you already gave it earlier in the session.
- It re-introduces a bug you fixed together, or reverts a deliberate choice.
- Its answers get vaguer and more generic — less tied to your actual code.
- It starts repeating suggestions or going in circles.
- After a "summarizing…" notice, the specifics evaporate and it speaks in broad strokes.
When you see two or more of these, the context is the problem. Pushing harder — more prompts, more frustration — only burns more of the window. The fix is almost always to reset or refocus the context, not to argue with the model.
Practical moves that keep an agent on track
None of these require understanding the model's internals. They all do one thing: keep the context window small, fresh, and pointed at what matters.
Scope tasks small
One focused task per session beats one sprawling marathon. "Add validation to the signup form" keeps a tight, relevant context. "Refactor the whole app" forces the agent to juggle dozens of files, overflow the window, summarize, and drift. Break big work into chunks an agent can finish before its desk fills up.
Start fresh sessions often
A new chat is an empty desk. When you switch to an unrelated task — or when drift sets in — start a new session instead of continuing a long one. You lose the cruft but keep your code; the agent re-reads what it needs. A clean, short context almost always outperforms a long, polluted one.
Point at specific files
Don't make the agent guess what's relevant — tell it. Most tools let you @-mention a file or folder to pin it into context. Naming the two or three files that matter is more reliable than hoping retrieval surfaces them, and it spends your token budget on signal instead of noise.
Give it durable instructions
Decisions you make in chat can scroll out of the window; rules in a project file don't. Most tools read a dedicated rules/memory file (for example, a file the agent loads every session). Put your stable conventions there — "use TypeScript," "tests live next to source" — so they survive every reset and every summarization.
| Symptom | Likely cause | Fastest fix |
|---|---|---|
| Forgot a file you mentioned | Pushed out of the window | Re-@-mention the file |
| Re-fixes solved bugs | Earlier fix scrolled out | New session; restate the decision |
| Vague after "summarizing…" | Lossy compaction | Start fresh, re-point at key files |
| Slow and pricey late in a chat | Window is bloated | Smaller tasks, shorter sessions |
| Ignores a convention | It was only said in chat | Move it to a rules/memory file |
Going deeper
Once the basics click, a few finer points explain the rest of the behavior you'll see — and where the field is heading.
Bigger windows help, but don't make the problem vanish. Modern models offer very large context windows (some up to a million tokens). That's a real gift for code — more of your project fits at once. But the same forces apply at a larger scale: the lost-in-the-middle effect still degrades recall in a stuffed window, and every token still costs latency and money on every turn. A huge window raises the ceiling; it doesn't repeal the rule that focused context beats bloated context.
Summarization quality varies. How well a tool compacts a long session — what it keeps versus drops — is a real differentiator between tools, and it's improving fast. Some now do this server-side and automatically as a conversation grows. You still can't fully control it, which is why an explicit fresh start, where you decide what carries over, often beats trusting an automatic summary.
Reasoning effort interacts with context. Coding agents increasingly let the model think before acting, and higher reasoning effort means the agent gathers more context and deliberates more per step. That usually improves results, but it also consumes more of the window with the model's own reasoning. It's another reason a well-scoped task pays off: the agent spends its budget on your problem, not on re-exploring a sprawling one.
This is the heart of "context engineering." Deliberately curating what goes into the window — and what stays out — is becoming its own discipline, the agent-era successor to prompt engineering. For coding specifically, it's also why your workflow matters as much as your model: the developer who scopes tightly, points precisely, and resets often will get better output from any tool than the one who dumps everything into one endless chat. The model is fixed; the desk is yours to manage.
From here, two directions are worth exploring: how tools actually decide what to retrieve (the retrieval and RAG machinery), and how different tools handle context in practice — comparisons like Claude Code vs Cursor often come down to exactly these indexing and summarization choices.
FAQ
Why does my AI coding tool forget files I already showed it?
Because every request only sees a fixed-size context window, and your file was pushed out of it to make room for newer content. The tool isn't ignoring you — that text simply isn't on the model's "desk" anymore. Re-mention the file (most tools support @-mentions) to pull it back in.
What is a context window in an AI coding assistant?
It's the bundle of text the model actually sees on a given request: the tool's instructions, some of your code, the conversation so far, and your current question. It's measured in tokens and has a fixed maximum. Anything outside it doesn't exist as far as the model is concerned, which is why managing what goes in is so important.
Does a bigger context window fix the forgetting problem?
It helps a lot — more of your project fits at once — but it doesn't fully fix it. Models still recall facts buried in a long context less reliably (the lost-in-the-middle effect), and every token still adds cost and latency on each turn. A focused context usually beats a stuffed one even when everything fits.
Why does my coding agent get slower and more expensive in long sessions?
Because the entire context window is re-read and re-billed on every single request. As a session grows, the window fills with conversation history, so each turn processes more tokens — slower and pricier. Shorter, well-scoped sessions keep the window small and snappy.
How do I stop an AI coding agent from drifting off task?
Watch for the drift signal — forgetting, repeating, or getting vague — and when you see it, stop adding prompts and reset the context instead. Scope tasks small, start fresh sessions when you switch focus, @-mention the specific files that matter, and put stable conventions in a rules/memory file so they survive every reset.