How Does the Agent Loop Work? Think, Act, Observe, Repeat

Q: Can the agent loop call multiple tools at once?

Some LLM APIs support **parallel tool calls** — the model can return several tool-call objects in a single response, and the runtime executes them concurrently. This is useful when two lookups are independent (e.g., fetching yesterday's and last month's prices in one turn instead of two). Parallel calls reduce total turns but require the runtime to handle concurrent execution and merge the results.

Walk through one full iteration of the loop at the heart of every agent and see exactly what the model receives and returns at each step.

BEGINNER11 MIN READUPDATED 2026-06-12

In plain English

Every AI agent — whether it books a flight, debugs code, or runs a research sweep — runs on the same basic engine: a loop. The model looks at what it knows, decides on one action, executes it, reads the result, and then starts again. Rinse and repeat until the job is done.

A good analogy is a sous chef working through a recipe while the head chef is away. They read the recipe card (think), start chopping onions (act), smell whether the pan is hot enough (observe), then glance at the recipe again to decide what's next (think again). Each lap around the kitchen is one iteration of the loop. The chef doesn't need the head chef standing over them — they adapt to whatever they find at each step.

In technical terms this pattern is called the Think → Act → Observe cycle, or sometimes the reason–act loop. It was formalized in the ReAct paper (Yao et al., 2023), which showed that interleaving reasoning traces with tool calls dramatically outperformed either approach alone.

Why it matters

A language model that only does one turn can describe what to do. An agent can do it. That gap is enormous in practice: the model can look up live data, run a shell command, call an API, read the error it gets back, correct itself, and try again — all without a human in the middle.

Before the agent loop became mainstream, building anything that required more than one step meant a developer hand-wiring every transition: copy the model's output, paste it into the next tool, bring the result back, call the model again. The loop automates all of that scaffolding. That's why the same coding assistant that once gave you one snippet can now edit a whole repository: it loops until each test passes.

What the loop enables that a single prompt cannot

Error recovery — the model sees the stack trace and retries with a fix.
Progressive information gathering — each tool call unlocks facts the model didn't have when it started.
Dynamic planning — the model can change its plan mid-task if an early step returns unexpected results.
Long-horizon tasks — tasks too big for one context window are broken into sequential steps, each building on the last.

How it works

The agent loop is a while loop controlled by the LLM: keep going as long as the model returns a tool call; stop when it returns plain text without a tool call. The runtime never decides when the task is done — the model does, by choosing not to call another tool.

// The Agent Loop

ThinkLLM reasons about next stepActRuntime executes tool callObserveTool result added to contextDone?No tool call → return answer↺ repeat

Step 1 — Think

At the start of each turn the runtime assembles the full context window and sends it to the LLM. That context includes: the system prompt (rules, persona, available tools), the original user request, and the complete history of every previous assistant message and tool result from this session.

The model reads all of that and produces a reasoning trace — sometimes visible to the developer as a thought field, sometimes hidden inside the model. Based on that reasoning it either calls a tool or decides the task is complete and writes a final answer.

Step 2 — Act

When the model chooses a tool it doesn't run code itself — it emits a structured tool-call object that your runtime executes. The object names the tool and provides typed arguments. A typical JSON representation looks like this:

Tool call emitted by the modeljson

{
  "tool": "web_search",
  "arguments": {
    "query": "AAPL closing price June 11 2026"
  }
}

The runtime receives that object, looks up the web_search function, calls it with the given arguments, and captures whatever it returns. The model is paused and waiting — it plays no role in the actual execution.

Step 3 — Observe

The runtime wraps the tool's return value in a tool-result message and appends it to the conversation history. On the next think step the model will see that result as part of its context and can reason about what it means.

Crucially, the assistant message containing the tool call must be appended to history before the tool result is added. Reversing that order breaks the message threading because the tool-result message references the tool_call_id of the preceding assistant message — if that message isn't already in history, the reference is invalid and the API will reject it.

Termination

The loop exits naturally when the model produces a text-only response with no tool calls — the model is signaling "I have enough information to answer." Frameworks also add hard safety limits: a maximum iteration count (typically 15–25 steps), a wall-clock timeout (~300 seconds), and sometimes a cost budget cap. If the iteration cap is hit, a common pattern is to inject a final message asking the model to synthesize whatever it has gathered so the user still gets a useful response.

A concrete example: one full iteration

Abstract descriptions only go so far. Here is a complete single iteration — from user prompt to the model's tool call and back — using a simple stock-price lookup task.

Turn 1: what the model receives

Messages sent to the LLM at the start of turn 1json

[
  {
    "role": "system",
    "content": "You are a financial research assistant. Use the tools provided to answer questions accurately."
  },
  {
    "role": "user",
    "content": "What was Apple's closing stock price yesterday and how does it compare to a month ago?"
  }
]

Turn 1: what the model returns

Model response — a tool calljson

{
  "role": "assistant",
  "content": null,
  "tool_calls": [
    {
      "id": "call_abc123",
      "type": "function",
      "function": {
        "name": "get_stock_price",
        "arguments": "{\"ticker\": \"AAPL\", \"date\": \"2026-06-11\"}"
      }
    }
  ]
}

Runtime appends the tool result

Tool result appended to message historyjson

{
  "role": "tool",
  "tool_call_id": "call_abc123",
  "content": "{\"ticker\": \"AAPL\", \"date\": \"2026-06-11\", \"close\": 211.42}"
}

Turn 2: the model reasons and acts again

Now the context contains the system prompt, the user question, the first tool call, and its result. The model sees that it only has the current price and still needs the price from a month ago, so it calls the tool a second time with a different date. The loop runs again.

After the second tool result is appended, the model has all the data it needs. On turn 3 it produces a plain-text response comparing the two prices — no tool call this time — and the runtime exits the loop and returns that answer to the user.

// Three turns to finish the task

Turn 1Think → call get_stock_price(AAPL, today)ObservePrice: $211.42Turn 2Think → call get_stock_price(AAPL, month ago)ObservePrice: $198.17Turn 3Think → no tool call → final answer

Common loop pitfalls

The agent loop is simple in concept but surprisingly tricky in production. These are the failure modes that catch most builders off guard.

Pitfall	What happens	How to guard against it
Infinite loop	The model keeps calling the same tool with the same arguments because the tool keeps returning an error it doesn't know how to resolve.	Set a hard max-iterations cap (15–25). Detect consecutive identical tool calls and break early.
Context explosion	Each turn adds tokens. After 20 turns of verbose tool results, the context window fills up and the model loses earlier steps.	Summarize or truncate old tool results. Use a rolling window or external memory store.
Premature exit	The model produces a confident-sounding final answer after turn 1, skipping verification steps.	Add explicit instructions in the system prompt: 'Always verify your answer with a second tool call before responding.'.
Hallucinated tool calls	The model invents a tool name or passes arguments with the wrong shape, causing a runtime error.	Validate tool-call arguments against the schema before executing. Return a structured error message so the model can self-correct.
Silent cost blowout	A task the developer expected to take 3 turns takes 30, burning 10x the expected budget.	Log turn count and token spend per session. Add a cost-budget stop condition alongside the iteration cap.

Going deeper

Once you understand the basic loop, three directions open up: variations on the loop structure, what lives inside the "Think" step, and how multiple loops can collaborate.

ReAct vs Plan-and-Execute

The loop described here is ReAct-style: the model decides what to do one step at a time, based on what it just observed. An alternative is Plan-and-Execute: a planner model creates a full task list upfront, then an executor model works through each item. Plan-and-Execute is more predictable and cheaper for well-defined tasks; ReAct adapts better when the path to the goal is unknown. Most production agents start with ReAct and add a planner layer only when they hit reliability limits.

The chain-of-thought inside Think

The "Think" step isn't just routing logic — it's where the model's reasoning happens. Modern reasoning models (like o3 or Claude's extended thinking mode) can spend many tokens on internal chain-of-thought before committing to a tool call. That scratchpad is invisible to the user but directly affects which tool gets called and with what arguments. Better reasoning in the Think step compounds across every iteration.

Memory across loops

A plain agent loop is stateless between sessions — the context is cleared when the conversation ends. For agents that need to remember past work, builders add external memory: a vector database of past observations, a structured key-value store, or a summarized scratchpad that gets injected into the system prompt at the start of each new session. Memory turns a one-shot loop into an agent that improves over time.

Multi-agent loops

Nothing stops one agent from calling another agent as a tool. An orchestrator agent decomposes a large task and dispatches sub-tasks to specialist subagent loops. Each subagent runs its own think-act-observe cycle and returns a result to the orchestrator, which folds it into its own context and continues. This is the architecture behind frameworks like LangGraph and multi-agent setups in the Anthropic Agent SDK.

FAQ

How is the agent loop different from a regular chatbot?

A chatbot makes one LLM call and returns the response. The agent loop calls the LLM, checks whether the model wants to use a tool, executes it if so, feeds the result back, and calls the LLM again. That cycle repeats until the model decides it has a final answer. A chatbot is a single step; an agent loop is a while loop.

What stops the agent loop from running forever?

The loop exits naturally when the model returns a plain-text response with no tool call. In production you always add hard limits on top: a maximum iteration count (typically 15–25), a wall-clock timeout, and optionally a token or cost budget. If the iteration cap fires, a common pattern is to prompt the model one final time without tools so it synthesizes whatever it has gathered.

Does the model see every previous tool result every turn?

Yes. By default the full message history — including every prior tool call and result — is included in the context on each turn. This is what lets the model reason about accumulated evidence. The downside is that context grows with every step, so long-running agents need strategies like result summarization or a rolling window to avoid filling the context window.

What is the ReAct pattern and how does it relate to the agent loop?

ReAct (Reasoning + Acting, Yao et al. 2023) is the research paper that popularized interleaving explicit reasoning traces with tool calls. The think-act-observe loop is essentially a practical implementation of ReAct. The main insight of the paper was that combining reasoning and acting in the same loop outperforms either pure chain-of-thought (reasoning only) or pure action selection.

Can the agent loop call multiple tools at once?

Some LLM APIs support parallel tool calls — the model can return several tool-call objects in a single response, and the runtime executes them concurrently. This is useful when two lookups are independent (e.g., fetching yesterday's and last month's prices in one turn instead of two). Parallel calls reduce total turns but require the runtime to handle concurrent execution and merge the results.

Why do I sometimes see 'agent stopped due to max iterations'?

This error means the agent reached its iteration cap without producing a final answer. Common causes include a tool that keeps returning errors the model can't resolve, a goal that's genuinely too complex for the allotted steps, or a system prompt that encourages excessive verification. Increase the cap only after diagnosing why the extra turns are needed — blindly raising the limit often just delays the same outcome.

// In plain English

// Why it matters

What the loop enables that a single prompt cannot

// How it works

Step 1 — Think

Step 2 — Act

Step 3 — Observe

Termination

// A concrete example: one full iteration

Turn 1: what the model receives

Turn 1: what the model returns

Runtime appends the tool result

Turn 2: the model reasons and acts again

// Common loop pitfalls

// Going deeper

ReAct vs Plan-and-Execute

The chain-of-thought inside Think

Memory across loops

Multi-agent loops

// FAQ

// Further reading

// Related

In plain English

Why it matters

How it works

A concrete example: one full iteration

Common loop pitfalls

Going deeper

FAQ

Further reading

Related