What Is the ReAct Agent Pattern?

Understand the Thought → Action → Observation loop that powers most LLM agents, why it outperforms plain chain-of-thought, and where it breaks down.

INTERMEDIATE12 MIN READUPDATED 2026-06-12

In plain English

ReAct (short for Reasoning + Acting) is a pattern for running an LLM as an agent. Instead of asking the model to jump straight to an answer, you ask it to think out loud first, then choose a tool to call, then read what the tool returned, then think again — looping until the task is done. The cycle of Thought → Action → Observation is the entire pattern.

It sounds simple, and it is. But that simplicity is the point. Before ReAct, LLMs either produced an answer in one shot (fast but brittle) or used chain-of-thought to reason step-by-step (better, but still no ability to check external facts). ReAct adds the missing piece: grounding. After every reasoning step the model can actually look something up, run code, or query a database — and then update its reasoning based on what it finds. The model stops guessing and starts verifying.

ReAct was introduced by Yao et al. in the 2023 paper "ReAct: Synergizing Reasoning and Acting in Language Models". It quickly became the dominant architecture for LLM agents because it maps almost perfectly onto how tool use already works: the model emits a structured request, your code runs the real function, the result comes back. ReAct just adds an explicit thinking step before each call.

Why it matters

The core problem ReAct solves is hallucination under uncertainty. A plain LLM asked "What did Apple announce yesterday?" will confidently make something up if the answer isn't in its training data. A chain-of-thought LLM will reason carefully about what Apple probably announced — still made up. A ReAct agent will instead emit Thought: I need to search for Apple's latest news → Action: web_search("Apple announcement today") → Observation: [real results] — and only then write an answer grounded in what it actually found.

The second thing ReAct gives you is interpretability. Because the model writes down every Thought before acting, you get a step-by-step transcript of its reasoning. When a run goes wrong you can read the trace and see exactly where the model misunderstood the task, chose the wrong tool, or drew the wrong conclusion from an observation. That's far better than a black-box answer with no trail.

Improvements over related approaches

vs. direct prompting — the model can fetch real data mid-task instead of relying on frozen training knowledge.
vs. chain-of-thought — CoT gives better reasoning but no external actions; ReAct adds the action-and-observe loop that keeps reasoning anchored to real results.
vs. tool use with no explicit reasoning — calling tools without narrated thoughts works but the model often calls the wrong tool or misreads outputs; the Thought step acts as a planning check before each call.
vs. pure planning (Plan then Act) — a plan written upfront can become stale after the first tool call reveals unexpected data; ReAct replans after every observation, adapting in real time.

How it works

A ReAct agent receives a task. It then enters a loop where each turn of the loop produces three labeled outputs in order:

Thought — the model narrates what it knows so far and what it plans to do next. This is not yet an action; it's the model thinking on paper.
Action — the model specifies a tool to call and its inputs (e.g., search[Apple WWDC 2025 announcements]).
Observation — the result returned by that tool, appended to the context by the agent harness (your code). The model reads this on the next turn.

When the model decides it has enough information, it emits a final Thought explaining its conclusion followed by Final Answer: ... instead of another Action. Your code detects the absence of a tool call and returns the answer to the user.

// The ReAct loop

Thoughtmodel reasons about the next stepActionmodel calls a tool with structured inputObservationtool result appended to contextRepeat or finishloop until Final Answer↺ repeat

What the raw prompt looks like

Concretely, the context the model sees grows with each turn. A typical trace for "How many days until the next leap year from today?" might look like this:

texttext

Task: How many days until the next leap year from today (2026-06-12)?

Thought: I need to know what year the next leap year after 2026 is.
  2028 is divisible by 4 and not a century year, so it qualifies.
  Next I need to calculate the days between 2026-06-12 and 2028-01-01.
Action: calculator[days_between("2026-06-12", "2028-01-01")]
Observation: 568

Thought: The tool returned 568 days. That is my answer.
Final Answer: There are 568 days until 1 January 2028, the next leap year.

Notice the Thought before the Action isn't optional fluff — it's doing real work. The model decides which tool to call and what inputs to give it based on its reasoning. And when the Observation comes back, the model immediately integrates it into the next Thought rather than ignoring it.

Implementing ReAct with a modern SDK

In practice you don't have to parse "Thought:" / "Action:" labels yourself. Modern SDKs handle the structure through function calling: the model returns a tool-use block (the Action), your loop runs the tool and returns the result (the Observation), and the reasoning (Thought) is embedded in the model's text or scratchpad. Here is a minimal Python example using the Anthropic SDK:

pythonpython

import anthropic, datetime

client = anthropic.Anthropic()

tools = [
    {
        "name": "web_search",
        "description": "Search the web for current information.",
        "input_schema": {
            "type": "object",
            "properties": {"query": {"type": "string"}},
            "required": ["query"],
        },
    }
]

def run_tool(name: str, args: dict) -> str:
    if name == "web_search":
        # In production, call a real search API here.
        return f"[stub] results for: {args['query']}"
    return "unknown tool"

messages = [{"role": "user", "content": "What AI model did Anthropic release most recently?"}]

# ReAct loop
for step in range(10):  # hard cap prevents infinite loops
    resp = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        tools=tools,
        # Extended thinking (if supported) produces explicit Thought blocks
        messages=messages,
    )
    messages.append({"role": "assistant", "content": resp.content})

    if resp.stop_reason != "tool_use":
        # No tool call → model gave its Final Answer
        for block in resp.content:
            if hasattr(block, "text"):
                print("Answer:", block.text)
        break

    # Run every tool the model requested (Action), feed results back (Observation)
    results = []
    for block in resp.content:
        if block.type == "tool_use":
            output = run_tool(block.name, block.input)
            results.append({
                "type": "tool_result",
                "tool_use_id": block.id,
                "content": output,
            })
    messages.append({"role": "user", "content": results})

Strengths and failure modes

ReAct has real strengths that explain why it became the default agent pattern. But it also has well-documented failure modes you need to plan for.

// ReAct: what works vs. what breaks

Strengths

Grounds answers in real tool output
Step-by-step trace is human-readable
Adapts plan after each observation
Works with any tool set
Easy to debug via transcript

Failure modes

Hallucinated Thoughts leak false context
Repetitive loops (same action, same result)
Observation overload loses earlier context
Context window fills on long tasks
Reasoning and action can diverge

Hallucinated thoughts

The Thought is still generated by the LLM and can contain false assertions. A common failure: the model writes Thought: The user's account was created in 2019 (invented) before calling a lookup tool — and then the model trusts its own hallucinated Thought even when the Observation says otherwise. The fix is to treat Thought blocks as plans, never as facts; the Observation is the only source of truth.

Repetitive loops

If a tool returns an error or a result the model doesn't know how to interpret, it may call the same tool again with the same arguments — and repeat until the step limit fires. Mitigation: inject the full error message as the Observation, add a system prompt rule like "if you receive the same error twice in a row, stop and explain the problem to the user", and set a per-tool retry cap in your harness.

Context window pressure

Every Thought, Action, and Observation gets appended to the context. A long task with many tool calls — especially tools that return large text blobs — can fill the context window before the task is done. Common mitigations: truncate or summarize observations before appending, prune old turns from the transcript, or summarize completed sub-tasks into a compact status block.

Reasoning–action divergence

Occasionally the model's Thought says "I should search for X" but the Action calls a different tool or uses different search terms. This is an alignment failure between the two outputs. Monitoring for it requires logging both the Thought and the Action in your traces — another reason LLM observability matters for agents in production.

ReAct vs. related patterns

ReAct is not the only pattern for agentic LLMs. As agents became a serious engineering discipline, several variants and successors emerged.

ReAct vs. Plan-and-Execute

Plan-and-Execute (also called Plan-and-Solve) separates planning from execution: a planner LLM call produces a full multi-step plan, then each step is executed in sequence, sometimes by a separate executor model. This gives a more structured decomposition and is easier to audit upfront. The tradeoff: if step 3 reveals that the plan was wrong, there's no cheap way to replan mid-execution without re-running the planner. ReAct replans implicitly after every observation — better for exploratory tasks, worse for predictable pipelines.

ReAct vs. Reflexion

Reflexion (Shinn et al., 2023) adds a self-evaluation step: after each full trial the agent reflects on what went wrong and stores a verbal summary of lessons learned, then retries. Think of it as ReAct plus long-term memory of failures. Reflexion outperforms ReAct on benchmarks that allow multiple attempts, but adds latency and requires a persistent memory store across trials.

ReAct in frameworks

Every major agent framework has first-class ReAct support. LangChain ships a create_react_agent helper that handles the Thought/Action/Observation formatting. LangGraph wraps ReAct in a stateful graph for branching and persistence. The Claude Agent SDK and OpenAI Agents SDK implement the same loop via native function calling. The pattern is the same in all of them — only the boilerplate changes.

Going deeper

Once you're comfortable with the basic loop, the engineering challenges shift from making it work to making it reliable and cheap. Here are the threads worth pulling.

Prompt design for ReAct

The system prompt is load-bearing in a ReAct agent. At minimum it should: list available tools with precise descriptions; instruct the model to always emit a Thought before acting; set the format for the Final Answer; and tell the model how to handle errors and ambiguous observations. Verbose, underspecified tool descriptions are the single biggest source of wrong tool calls.

Extended thinking and native reasoning models

Models with built-in extended thinking (like Claude 3.7 Sonnet with extended thinking enabled, or OpenAI o1/o3) already generate internal reasoning traces before responding. This effectively gives you the Thought step for free at the model level. For these models the ReAct loop still applies — you still need the Action → Observation cycle — but you no longer need to prompt-engineer the Thought into the visible context; it happens internally.

Observability and debugging

Every production ReAct agent should log each turn's full context: the Thought, the Action (tool name + inputs), and the Observation (truncated if large). This transcript is your primary debugging tool. When a run fails, read the trace top-to-bottom and look for the first Thought that contains a false assumption — that's almost always where the run diverged. Services like LangSmith, Langfuse, and Weave are built around this trace-first debugging model.

When not to use ReAct

ReAct adds overhead — tokens, latency, cost — that isn't always justified. If a task can be solved with a single well-crafted prompt and the model's training data, chain-of-thought prompting is faster and cheaper. If the sequence of steps is known in advance, a hard-coded RAG pipeline or a scripted workflow is more reliable. Reach for ReAct when the steps genuinely depend on what the tools return — that is, when you can't know the path until you start walking it.

FAQ

What does ReAct stand for in AI?

ReAct stands for Reasoning and Acting. It's a pattern introduced in a 2023 paper where an LLM is prompted to emit a Thought (reasoning) before each Action (tool call), then reads the tool's Observation before reasoning again. The cycle repeats until the model produces a Final Answer.

How is ReAct different from chain-of-thought prompting?

Chain-of-thought adds reasoning steps to a single prompt-and-answer turn but has no ability to call external tools or check facts mid-reasoning. ReAct extends CoT with an action-and-observe loop: after each reasoning step the model can call a real tool, read the actual result, and update its reasoning — grounding the answer in live data rather than training-time knowledge.

Does ReAct prevent hallucinations?

It reduces them significantly for factual tasks by grounding answers in tool outputs rather than model memory. But it doesn't eliminate hallucinations entirely — the Thought step itself is generated by the model and can contain false assumptions that contaminate later reasoning. The Observation is the only reliable source of truth; treat Thoughts as plans, not facts.

What is an Observation in ReAct?

An Observation is the result returned by a tool call, appended to the conversation context by the agent harness (your code). The model reads it on the next turn and uses it to update its reasoning. Observations are the mechanism that grounds the agent in real data instead of model hallucinations.

Is ReAct the same as an agent loop?

ReAct is one specific implementation of an agent loop — the most popular one. The broader concept of an agent loop (observe, reason, act, repeat) predates ReAct. What ReAct adds is the explicit Thought step before each Action and the Thought/Action/Observation labeling in the prompt, which proved to be the key that makes the loop reliable in practice.

Do LangChain and LangGraph use ReAct?

Yes. LangChain provides a create_react_agent function that wires up the Thought/Action/Observation loop with a set of tools. LangGraph lets you build more complex variants — branching, parallelism, persistent state — while still following the same core pattern. Most modern agent frameworks, including the Claude Agent SDK and OpenAI Agents SDK, implement the same loop via native function-calling APIs.

// In plain English

// Why it matters

Improvements over related approaches

// How it works

What the raw prompt looks like

Implementing ReAct with a modern SDK

// Strengths and failure modes

Hallucinated thoughts

Repetitive loops

Context window pressure

Reasoning–action divergence

// ReAct vs. related patterns

ReAct vs. Plan-and-Execute

ReAct vs. Reflexion

ReAct in frameworks

// Going deeper

Prompt design for ReAct

Extended thinking and native reasoning models

Observability and debugging

When not to use ReAct

// FAQ

// Further reading

// Related

In plain English

Why it matters

How it works

Strengths and failure modes

ReAct vs. related patterns

Going deeper

FAQ

Further reading

Related