In plain English
When an AI agent writes some code, drafts an email, or plans a multi-step task, its very first attempt is rarely its best. Like a person typing fast, it can leave a bug in line three, miss a requirement, or contradict itself halfway through. Reflection is the simple but powerful idea of making the agent stop, read its own work, and decide whether to fix it before handing the result to you.

The everyday analogy is a writer with an editor — except both roles are played by the same model wearing two hats. First it writes a draft (the generate hat). Then it puts on the critique hat: "Does this actually answer the question? Are there mistakes? What's weak?" Finally it puts the writer hat back on and produces a revised draft that addresses its own notes. One pass through that loop often turns a mediocre answer into a solid one.
This is sometimes called self-correction or self-critique, and the formal research version is the Reflexion pattern. The names differ, but the shape is the same: generate → evaluate → revise, repeated until the work is good enough or you run out of budget. It is a planning and quality technique, not a new kind of model — any capable LLM can do it with the right prompting.
Why it matters
A single forward pass through an LLM is a one-shot effort: the model commits to every token as it goes and can't easily back up. For short, easy answers that's fine. For anything that has to be correct — code that must compile, a plan whose steps must connect, a SQL query that must return the right rows — a one-shot answer is a gamble. Reflection turns that gamble into a checkable, repeatable process.
Here is the core problem it solves, and why builders reach for it:
- Errors that are obvious in hindsight. Models make slips a quick re-read would catch: an off-by-one, a missing edge case, a forgotten part of the instructions. Asking "is this correct? what did you miss?" surfaces a surprising number of them.
- No external feedback by default. A plain agent has no idea whether its answer worked. Reflection is a way to manufacture feedback internally when you don't have a test suite or a human in the loop — and to use feedback (like an error message) when you do.
- Reliability over raw capability. A slightly weaker model that checks its own work can beat a stronger model that answers blind, especially on multi-step tasks where small mistakes compound. Self-correction is one of the cheapest ways to raise an agent's success rate without changing the model.
- Tasks with a verifiable signal. Coding, math, data queries, and tool calls often come with a ground-truth check — does it run? does the test pass? did the API return an error? Reflection shines when there's a concrete signal to react to.
The catch — and we'll return to it — is that reflection is not free. Every critique and revision is another model call, costing time and tokens. Used well, it's the difference between a flaky demo and a dependable agent. Used blindly, it burns budget and can even make answers worse. Knowing when to reflect is as important as knowing how.
How it works
Mechanically, reflection is a loop you build around the model, not a feature inside it. You run the model to produce an output, run it again to grade that output, and — if the grade is poor — run it a third time to produce a better version, feeding the critique back in. The whole thing is plumbing and prompting on top of an ordinary agent loop.
The three steps in detail
- Generate. The agent does the task normally and produces a first draft — code, a plan, an answer.
- Evaluate. The agent (or a separate verifier) judges that draft. The judgment can be internal — "re-read your answer and list every problem you find" — or external — run the unit tests, execute the code, validate against a schema, call the API and read the error. External signals are far more trustworthy because they don't depend on the model's own opinion.
- Revise. The critique (and any error output) is pasted back into the prompt, and the agent rewrites its answer to address each point. The loop then repeats: evaluate the new draft, and stop when it passes the check or you hit a maximum number of tries.
Stopping conditions
A reflection loop needs a clear exit, or it will spin forever (or until your budget dies). Common stop rules: the external check passes (tests green, no errors); the critic reports "no significant issues"; the new draft is no different from the last one; or a hard cap of N iterations is reached. In practice the cap matters most — most gains land in the first one or two revisions, and more rarely help.
from anthropic import Anthropic
client = Anthropic(api_key="sk-ant-...")
def ask(prompt):
msg = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=800,
messages=[{"role": "user", "content": prompt}],
)
return msg.content[0].text
def solve_with_reflection(task, check, max_iters=3):
answer = ask(f"Solve this task:\n{task}")
for _ in range(max_iters):
feedback = check(answer) # external: run tests, lint, execute
if feedback == "OK":
return answer # passed — stop early
# REVISE: feed the failure back in and try again
answer = ask(
f"Task:\n{task}\n\n"
f"Your previous answer:\n{answer}\n\n"
f"It failed this check:\n{feedback}\n\n"
f"Fix the problem and return the corrected answer."
)
return answer # best effort after max_itersNotice that the check function is where the real power lives. If check runs your test suite, the loop is grounded in truth and the agent genuinely converges. If check is just the model grading itself, you're relying on the model's judgment — better than nothing, but it can be confidently wrong about its own output.
Reflexion: remembering past mistakes
Plain self-correction fixes one answer and forgets everything. Reflexion adds a twist: after each failed attempt, the agent writes a short lesson in plain language — "I assumed the list was sorted; it wasn't" — and keeps that note in its working context for the next try. Instead of blindly re-rolling, the agent carries forward a growing memo of what went wrong and how to avoid it.
The clever part is that the lesson is natural language, not a weight update. Nothing about the model is retrained; the agent just reasons over its own notes. This is closely tied to agent memory: within one task the lessons live in short-term working context, but you can also persist the most useful ones to long-term memory so the agent avoids repeating the same mistake across future tasks.
Self-check vs separate verifier
Who does the evaluate step? You have two broad choices, and the difference matters a lot for reliability.
| Same-model self-check | Separate verifier | |
|---|---|---|
| Who critiques | The same model that wrote the draft | A different call, model, or a real tool/test |
| Strength | Cheap, simple, no extra setup | Independent — catches errors the author is blind to |
| Weakness | Shares the author's blind spots; may bless its own mistakes | More cost, more plumbing to build |
| Best when | Quick polish, no ground-truth check available | Correctness matters and a real check exists |
| Trust level | Opinion | Closer to fact (especially with code execution / tests) |
The honest limitation of same-model self-check: a model that confidently produced a wrong answer often also confidently approves it on re-read. Its critique is drawn from the same knowledge that produced the mistake, so it can share the same blind spot. That's why the strongest reflection setups anchor the evaluate step in something external — running the code, executing the tests, validating against a schema, or asking a tool — rather than the model's own say-so.
A separate verifier doesn't have to be a different model. The most reliable verifier is usually a program: a compiler, a test runner, a JSON-schema validator, a linter, or the target API itself returning a real error. When that exists, use it. Reserve LLM-as-judge critique for fuzzy tasks (tone, clarity, completeness) where no mechanical check applies.
Reflection vs evaluator-optimizer and orchestrator-worker
Reflection is easy to confuse with two named workflow patterns, because they all involve more than one model step. The distinction is worth nailing down.
- One agent reviews ITS OWN work
- Generate, critique, revise — same task
- Goal: fix this one output
- Loop until good enough or capped
- Two fixed roles: maker + grader
- Grader scores against a rubric
- A formalized, structured version
- Often a dedicated evaluator model
- A planner SPLITS a task into subtasks
- Workers each do a DIFFERENT piece
- About division of labor, not review
- No self-critique implied
Evaluator-optimizer is essentially reflection promoted to a named pattern: you make the generate role and the evaluate role explicit, give the evaluator a rubric, and loop. Think of reflection as the underlying idea and evaluator-optimizer as one disciplined way to wire it up. Orchestrator-worker is a different animal entirely — it's about breaking a big task into parts and delegating, like a manager assigning subtasks. It involves multiple steps but no self-review; an orchestrator-worker system can itself use reflection inside any of its workers.
Rule of thumb: if the second step reviews the first step's output on the same task, it's reflection. If the second step does a different part of the work, it's delegation.
Going deeper
Reflection is one of those techniques that looks like a free win and then teaches you its limits. A few advanced points to internalize before you sprinkle it everywhere.
Diminishing returns and over-correction
Most of the benefit arrives in the first revision. The second helps less; the third often does nothing. Worse, with no real check to guide it, a model asked to keep critiquing can talk itself out of a correct answer — an over-correction trap where it "fixes" things that weren't broken, second-guesses a right call, or oscillates between two answers. If you can't ground the loop in a verifiable signal, keep the iteration count small (one or two) and be skeptical of long self-critique chains.
Cost, latency, and when to skip it
Each reflection round multiplies token use and wall-clock time. For a simple factual lookup or a casual chat reply, that's pure waste — the first answer was already fine. Reserve reflection for tasks where a wrong answer is expensive and a second look genuinely helps: code generation, multi-step plans, data transformations, structured extraction, and anything with a test you can run. Deciding whether the extra calls are worth it is part of the broader question of whether you even need an agentic setup.
Relationship to reasoning and the ReAct loop
Reflection overlaps with two neighbors. Reasoning models that "think" before answering already do a kind of internal self-checking inside a single response — reflection makes that explicit and external, with a real check between attempts. And the ReAct pattern (reason → act → observe) naturally creates reflection points: after an action returns an observation (say, a tool error), the agent reasons about that result before its next move, which is self-correction grounded in real feedback. Many production agents get reflection "for free" this way, simply by reacting honestly to what their tools return.
Where to take it next
Once the basic loop clicks, the open questions are about quality of feedback and what to remember. Better external verifiers (richer tests, sandboxed execution) make the loop converge faster. Better-written reflection notes — specific, actionable, deduplicated — make Reflexion-style memory pay off across tasks instead of cluttering the context. And the durable lesson mirrors the rest of agent design: a self-correction loop is only as good as the signal it corrects against, so most of your effort belongs in building a trustworthy evaluate step, not in clever critique prompts.
FAQ
What is agent reflection in AI?
Agent reflection is when an AI agent reviews its own output before finalizing it. The agent generates an answer, critiques it (or runs a real check on it), and then revises it based on that critique. The loop repeats until the work passes the check or hits an iteration limit. It's a way to catch mistakes the first attempt missed.
What is the Reflexion pattern?
Reflexion is a self-correction approach where, after each failed attempt, the agent writes a short lesson in plain language about what went wrong (for example, "I forgot to handle empty input"). That lesson is kept in context so the next attempt avoids the same mistake. Nothing is retrained — the agent just reasons over its own written notes.
Does self-correction actually make LLM answers better?
It depends on the feedback. With a real external check — running tests, executing code, validating a schema — self-correction reliably improves results, especially on coding and multi-step tasks. With only same-model self-critique and no ground-truth signal, gains are smaller and the model can even over-correct a right answer into a wrong one. Anchor the loop in a verifiable check whenever you can.
What's the difference between reflection and the evaluator-optimizer pattern?
They're the same core idea at different levels. Reflection is the general loop of an agent reviewing and revising its own work. Evaluator-optimizer is a formalized version with two explicit roles — a maker and a grader scoring against a rubric — wired together. Reflection is the concept; evaluator-optimizer is one disciplined way to implement it.
How many times should an agent reflect on its work?
Usually one or two iterations. Most improvement lands in the first revision, the second helps less, and beyond that you mostly burn tokens and risk over-correction. Set a hard cap and exit early when an external check passes. Long self-critique chains rarely pay off unless a real verifier is guiding each step.
Can an agent reliably check its own work?
Only partly. A model that produced a wrong answer often shares the same blind spot when grading it, so it may approve its own mistake. Same-model self-check is fine for polish and fuzzy quality, but for correctness you should use a separate verifier — ideally a real program like a test runner, compiler, or schema validator rather than the model's opinion.