In plain English
Agent planning is the part of an AI agent that takes a big, fuzzy goal and turns it into a concrete, ordered list of smaller steps it can actually do. Instead of charging at "plan a three-day trip to Lisbon" in one blind leap, the agent first writes itself a checklist: find flights, book a hotel near the centre, list things to see, build a day-by-day schedule. Then it works through that list.
Think about how you tackle a daunting task — say, moving apartments. You don't just start lifting boxes. You make a plan: book a van for Saturday, pack the kitchen first, cancel the old internet, redirect the mail. Big goal, broken into a sequence of doable chunks, roughly in the right order. Planning gives an agent that same habit. The technical name for the splitting-up part is task decomposition.
A plain LLM and even a simple AI agent can muddle through short tasks by just reacting one step at a time. Planning is what you add when the task is long enough that winging it stops working — when forgetting step 4 means redoing steps 1 through 3.
Why it matters
The problem planning solves is getting lost on long tasks. An agent without a plan reacts to whatever just happened. That's fine for "what's the weather and should I bring an umbrella?" It falls apart on "migrate this codebase to the new framework and keep all the tests passing," where the agent has to juggle a dozen interdependent steps and not lose the thread halfway through.
Errors also compound. If each step is 95% reliable, a 15-step task succeeds only about half the time (0.95^15). A reactive agent that wanders has more steps and more places to go wrong. A plan keeps the step count tight and the agent pointed at the goal, which is one of the biggest levers you have for reliability.
Who should care
- Developers building anything multi-step — research assistants, coding agents, data pipelines, automations that touch several systems.
- Product teams deciding how much autonomy to give an agent. A visible plan is also the easiest place to put a human approval step before the agent acts.
- Anyone using AI tools — the "deep research" button and your coding assistant both plan under the hood. When they go off the rails, it's usually a planning failure.
What did planning replace? Mostly you. Before agents planned, a human broke the work into steps and fed them to the model one at a time. Planning moves that decomposition inside the agent — though, as we'll see, knowing when not to plan is just as valuable.
How it works
There are two broad styles. The first is plan-then-act (sometimes called plan-and-execute): the agent writes the whole plan up front, then executes the steps in order. The second is interleaved planning, where the agent plans and acts in the same breath, deciding the next step only after seeing the result of the last one — this is the ReAct pattern, reason then act, looping.
Mechanically, the plan is just text the model generates — usually a numbered list, sometimes structured JSON so your code can read it. There's no separate planning engine; the same LLM that does the work writes the plan, prompted with something like "break this goal into a numbered list of steps, then we'll do them one at a time." This is closely tied to chain-of-thought prompting: planning is essentially structured, goal-directed reasoning.
Replanning: what happens when reality disagrees
Plans go stale. Step 2 fails, a search returns nothing, the file isn't where the agent expected. A good planning agent doesn't blindly march on — it replans: it looks at what actually happened, updates the remaining steps, and continues. This feedback loop is what separates a brittle script from a real agent.
The steps themselves get done with tool use — searching the web, running code, reading a file — wired up via function calling. Planning decides which tools to use in what order; tool use is how each planned step actually touches the world. The plan and the running transcript both live in the model's context window, which is why keeping the plan short and re-surfacing it matters on long runs.
Plan-then-act vs. ReAct: which to use
These are the two planning styles you'll meet everywhere, and they trade off the same way: foresight versus flexibility.
- Whole plan written up front
- Fewer, cheaper model calls
- Easy to show a human for approval
- Can go stale if reality shifts
- Good for predictable, structured tasks
- Plans one step at a time
- Adapts after every result
- More model calls, higher cost
- Harder to preview or audit
- Good for messy, exploratory tasks
In practice most production agents are a hybrid: draft a high-level plan up front so there's a roadmap and a place for human sign-off, then execute each step ReAct-style so the agent can adapt to surprises. Frameworks like LangGraph and the orchestrator-worker pattern in multi-agent systems are built around exactly this shape: a planner up top, executors below.
Build a tiny planner
Here's the plan-then-act idea in about 30 lines of Python with the Anthropic SDK. Step 1: ask the model for a plan as a JSON list. Step 2: walk the steps. The point is that the plan is just text the model wrote — your code parses it and loops.
import json
import anthropic
client = anthropic.Anthropic(api_key="sk-...") # your key here
MODEL = "claude-sonnet-4-5"
goal = "Write a short blog post comparing two note-taking apps."
# 1. PLAN — ask the model to decompose the goal into ordered steps.
plan_prompt = (
f"Goal: {goal}\n"
"Break this into 3-5 ordered steps. "
'Reply with ONLY a JSON array of strings, e.g. ["step one", "step two"].'
)
resp = client.messages.create(
model=MODEL, max_tokens=512,
messages=[{"role": "user", "content": plan_prompt}],
)
steps = json.loads(resp.content[0].text)
print("PLAN:", steps)
# 2. EXECUTE — do each step in order, passing results forward as context.
results = []
for i, step in enumerate(steps, 1):
context = "\n".join(f"- {r}" for r in results) or "(nothing yet)"
do_prompt = (
f"Overall goal: {goal}\n"
f"Work so far:\n{context}\n\n"
f"Now do step {i}: {step}"
)
out = client.messages.create(
model=MODEL, max_tokens=1024,
messages=[{"role": "user", "content": do_prompt}],
)
results.append(out.content[0].text)
print("\nFINAL:\n", results[-1])That's the skeleton. A real planner adds tool use inside each step, and a replanning check after each one: feed the result back and ask "is the plan still right, or should we revise the remaining steps?" Real agent frameworks bundle this loop, plus retries and memory, so you don't hand-roll it.
Common pitfalls
Planning is powerful, but it fails in predictable ways. The big ones:
| Pitfall | What goes wrong | Fix |
|---|---|---|
| Over-planning | Agent writes a 12-step plan for a 2-step task | Plan only when the task is genuinely long |
| Stale plans | Agent follows the original plan after reality changed | Replan after each step; check the goal is still reachable |
| Lost plan | Long transcript pushes the plan out of context | Re-surface the plan each turn; summarize old steps |
| No stopping rule | Agent replans forever, never finishing | Cap total steps; detect when it's looping |
| Vague steps | "Research the topic" isn't executable | Force concrete, tool-shaped steps |
The lost plan problem is the sneaky one. As an agent works, its transcript grows, and on a 40-step run the original plan can scroll out of the useful part of the context window. The agent forgets what it was doing. The fix lives in context engineering: keep the current plan and progress pinned near the top of context, and summarize the noisy middle.
Going deeper
Once the basic loop clicks, planning opens into a rich and still-unsolved research area. A tour of the frontier.
Beyond linear lists: trees and graphs
A numbered list is the simplest plan, but tasks aren't always linear. Tree of Thoughts lets the model explore several candidate next steps and back-track from dead ends, like searching a maze instead of walking one corridor. Graph-of-thought and DAG-style planners express steps with dependencies, so independent sub-tasks can run in parallel. These help on hard reasoning and search problems but cost many more model calls.
Reflection and self-correction
Advanced agents add a reflection step: after acting, the agent critiques its own result ("did that actually satisfy the requirement?") and revises the plan if not. The Reflexion pattern formalizes this — the agent keeps a running log of what went wrong and feeds those lessons into the next attempt. It's a form of self-evaluation, close in spirit to LLM-as-a-judge, but pointed at the agent's own steps.
Planning across agents and memory
At scale, planning becomes delegation. An orchestrator agent decomposes the goal and hands sub-plans to specialist workers — the core of a multi-agent system. Wiring all these agents and tools together is exactly what the Model Context Protocol (MCP) standardizes. And plans that span sessions need long-term memory: the agent stores past plans and outcomes, often in a vector database, and retrieves them later — the same retrieval idea behind RAG.
Evaluating plans is the real bottleneck
Because planning is non-deterministic, you can't unit-test it. Two runs of the same goal can produce different plans, both valid. So you measure planning with evals: did the agent reach the goal, in how many steps, at what cost, across many runs? Good observability lets you replay a failed run and see exactly where the plan broke. In production, judging plan quality — not generating plans — is usually the hard part.
FAQ
What is planning in AI agents?
Planning is the part of an AI agent that turns a big goal into an ordered list of smaller, executable steps before (or while) it acts. It's how an agent decides what to do, in what order — for example splitting "plan a trip" into find flights, book a hotel, build a schedule. The core sub-skill is task decomposition: breaking the goal into doable chunks.
What is the difference between plan-and-execute and ReAct agents?
A plan-and-execute agent writes the entire plan up front, then runs the steps in order — cheaper and easy to show a human, but it can go stale if reality changes. A ReAct agent interleaves reasoning and acting, deciding each next step only after seeing the last result — more adaptive but more model calls and harder to audit. Many production agents combine both: a rough plan up front, ReAct-style execution underneath.
What is task decomposition for LLMs?
Task decomposition is splitting one large goal into smaller sub-tasks an LLM can handle one at a time. Instead of asking the model to do everything in a single shot, you ask it to break the goal into ordered steps, then work through them. It's the decomposition half of agent planning and the main reason agents can tackle long jobs instead of just answering questions.
What is replanning in AI agents?
Replanning is when an agent revises its plan partway through because the situation changed — a step failed, a search came up empty, or new information appeared. After each step the agent checks whether the remaining plan still makes sense and updates it if not. This feedback loop is what keeps an agent from blindly following a stale plan to a dead end.
Do all AI agents need a planning step?
No. Short, reactive tasks — a single lookup or a one-tool action — work fine without an explicit plan, and adding one just wastes tokens and creates extra failure points. Planning earns its keep on long, multi-step, multi-tool tasks where forgetting an earlier step means redoing work. The skill is knowing when the task is long enough to need a plan.
Why do AI agents get lost on long tasks?
Two reasons. First, errors compound — if each step is 95% reliable, a 15-step task succeeds only about half the time. Second, the original plan can scroll out of the context window as the transcript grows, so the agent forgets what it was doing. The fixes are replanning after each step, capping total steps, and pinning the plan and progress near the top of context.