In plain English
An AI agent doesn't answer in one shot. It runs in a loop: the model thinks, calls a tool, reads the result, thinks again, calls another tool, and so on. A stop condition is simply the rule that says this loop is over — return the answer and walk away. Without one, the loop has no exit. The agent would keep going forever, or until something else kills it.

Picture a new intern told to "research our top competitor and write a summary." A good intern reads a few pages, decides they have enough, and hands you the summary — they know when to stop. A bad intern keeps opening one more tab, re-reading the same article, and never delivers anything. The difference isn't intelligence. It's having a clear finish line. Stop conditions are how you give an agent that finish line.
There are really two kinds of finish line. The natural one: the agent decides the task is done and produces a final answer instead of another tool call. And the safety ones: hard ceilings you set in advance — a maximum number of steps, a budget, a time limit — that end the loop even when the agent itself doesn't realize it should quit. Every production agent needs both.
Why it matters
A loop with no reliable exit is the single most common — and most expensive — way an agent fails. Three concrete problems make stop conditions non-negotiable.
- Runaway cost. Every loop iteration is at least one model call, and an agent's context grows with each step, so later calls cost more than earlier ones. An agent stuck in a loop doesn't fail loudly — it quietly bills you for thousands of calls. People have woken up to surprise invoices from a single agent that never stopped.
- Infinite and repeating loops. Models get stuck. The agent calls the same search, gets the same useless result, and tries it again — forever. Or two steps ping-pong: edit a file, undo the edit, edit it again. Nothing crashes; it just never finishes.
- Stuck-but-confident dead ends. Sometimes the task genuinely can't be completed — the API is down, the file doesn't exist, the question is unanswerable. A good agent should give up gracefully and say so. Without a stop rule, it keeps thrashing instead of returning a clean "I couldn't do this."
Who cares about this? Anyone shipping an agent to real users. A demo agent that runs once on your machine looks fine. The moment it runs unattended — on a schedule, behind an API, for a thousand users — an ungoverned loop becomes a billing incident and a reliability bug at the same time. Stop conditions are the seatbelt: cheap to add, and the thing you regret skipping. They turn an agent from "works on my laptop" into "safe to leave running."
How it works
At the end of every loop iteration, the runtime asks one question: should I go around again, or stop here? It checks the stop conditions in order. The natural stop is checked first — did the model produce a final answer? If yes, return it. If not, the guards get a turn: have we hit the step limit, blown the budget, run out of time, or detected that we're stuck? If any guard trips, the loop ends regardless of what the model wanted.
The natural stop: the model decides it's done
Most agents are built around tool calls. On each turn the model can either call a tool (keep working) or respond with plain text (it's finished). The runtime reads which one happened. A turn with no tool call is the signal: the agent believes it has its answer, so the loop exits and that text is returned to the user. This is the clean, intended way to stop — the model self-terminates because the task is genuinely complete.
Some setups make this even more explicit by giving the agent a dedicated finish or submit_answer tool. Calling that tool is the stop signal. Either way, the principle is the same: there is a specific, detectable event that means done.
Hard guards: limits you set, not the model
You cannot trust the model to always stop itself, so you wrap the loop in ceilings it can't talk its way past. These are dead-simple counters checked by your code, not the model — that's exactly why they're reliable.
| Guard | What it counts | Trips when |
|---|---|---|
| Max steps (iterations) | Number of loop passes | It reaches, say, 25 — the most common single guard |
| Token / cost budget | Tokens or dollars spent so far | Cumulative spend crosses your cap |
| Wall-clock time | Seconds since the agent started | A timeout (e.g. 120s) is exceeded |
| Tool-call quota | Calls to a specific tool | An expensive tool is used too many times |
Soft guards: noticing the agent is stuck
Hard guards stop a long agent. Soft guards stop a stuck one early, before it wastes the rest of its budget spinning. They look at the recent history of the loop for signs of no progress.
- Repetition detection. If the agent makes the exact same tool call with the exact same arguments two or three times in a row, it's looping — stop and report it.
- No-progress checks. Track whether each step actually changes the state (new information found, a file modified, the goal closer). Several steps with no change means the agent is treading water.
- Confidence thresholds. Some agents emit a confidence or score with each result; if the agent reports it's confident enough, you can stop early rather than running the full budget.
Stop conditions in code
Here is the whole idea as a single loop. The natural stop is the if not tool_calls branch; everything around it is guards. Notice that the model never gets to override max_steps — the host code owns that decision.
import time
def run_agent(task, max_steps=25, budget_usd=1.00, timeout_s=120):
messages = [{"role": "user", "content": task}]
spent = 0.0
start = time.time()
recent_calls = []
for step in range(max_steps): # HARD GUARD: max steps
if spent >= budget_usd: # HARD GUARD: cost budget
return "Stopped: budget exhausted."
if time.time() - start > timeout_s: # HARD GUARD: wall-clock
return "Stopped: timed out."
reply, cost, tool_calls = model_step(messages)
spent += cost
if not tool_calls: # NATURAL STOP: final answer
return reply
signature = (tool_calls[0].name, tool_calls[0].args)
recent_calls.append(signature)
if recent_calls[-3:].count(signature) == 3: # SOFT GUARD: repetition
return "Stopped: agent is looping on the same action."
messages += run_tools(tool_calls) # feed results back, loop again
return "Stopped: reached max steps without finishing." # HARD GUARD firesChoosing your limits
There's no universal number — the right ceilings depend on the job. A quick lookup needs a tight leash; a deep research or coding task needs room to breathe. Use these as starting points, then tune against real runs.
| Agent type | Typical max steps | Why |
|---|---|---|
| Simple Q&A / lookup | 3–8 | One or two tool calls should answer it; more means it's confused |
| Multi-step research | 15–40 | Needs to gather and cross-check several sources |
| Coding / file editing | 30–100+ | Read, edit, run tests, fix, repeat — many small steps |
| Open-ended / autonomous | Step + time + cost | Unbounded scope; lean on budgets, not just step count |
A practical rule: set the limit a bit above the worst legitimate run you've seen, not at the average. If a real task occasionally needs 30 steps, a max of 25 will cut it off mid-work and look like a bug. The guard exists to catch runaways, not to strangle normal work. And always layer guards — a step limit alone won't save you if one step calls a slow, costly tool, so pair it with a time or cost budget.
Going deeper
Once the basics are in place, stop conditions become a design surface of their own. A few directions worth knowing.
Graceful stops vs hard kills. There's a difference between abruptly killing the loop and asking the agent to wind down. A nicer pattern: when a guard is close to tripping, inject a message like "You have one step left — summarize what you found and finish now." The agent then produces a useful partial answer instead of being cut off mid-thought. Same ceiling, far better last impression.
Stopping and context limits interact. Long-running agents don't just risk cost — they risk filling the context window. As history grows, calls get slower and pricier and eventually overflow. So "stop" and "compact the history to keep going" are two answers to the same pressure; mature agents do both, summarizing old steps so the loop can continue without exploding.
Human-in-the-loop as a stop. Not every exit is automatic. A common safety pattern pauses the agent and hands control back to a person before an irreversible action — sending money, deleting data, emailing a customer. That pause is a stop condition too: the loop halts and waits for approval rather than for a counter. It's the most important guard for any agent that can affect the real world.
Where the harder problems live. Detecting "no progress" is genuinely tricky — an agent exploring a dead end can look identical to one doing slow, real work, and a too-eager stuck-detector kills good runs. Multi-agent systems compound this: each sub-agent needs its own budget, and you need a global ceiling so they don't collectively run forever. The durable lesson is the same one experienced builders repeat: design the exits with as much care as the happy path. A great agent is defined as much by knowing when to stop as by what it does while running. From here, revisit the agent loop, agent planning, and whether you even need an agent for this task at all.
FAQ
How does an AI agent know when to stop?
Two ways. The natural stop: the model produces a final text answer (or calls a finish tool) instead of another tool call, signaling the task is done. And the safety stops: hard limits you set in code — max steps, a cost or time budget, or stuck-detection — that end the loop even when the model doesn't stop on its own. Production agents use both.
What is a good max iterations limit for an agent?
It depends on the task. A simple lookup needs only 3–8 steps; multi-step research often needs 15–40; coding agents can need 100 or more. Set the ceiling slightly above the worst legitimate run you've observed, so it catches runaways without cutting off normal work, and always pair it with a time or cost budget.
How do I stop an agent from looping forever?
Layer multiple guards. A hard max_steps counter is the baseline. Add a cost or wall-clock budget so a single expensive step can't run wild. Then add repetition detection: if the agent makes the same tool call with the same arguments two or three times in a row, stop and report that it's stuck.
Can I just tell the agent in the prompt to stop after N steps?
No — not as your only defense. Models forget instructions, miscount steps, and can reason their way out of self-imposed limits. A reliable stop condition lives in your host code as a counter the model cannot reach or override. Prompt-level guidance can help, but the enforceable limit must be in code.
What should an agent return when it hits a stop limit?
A clean, explainable result — never a crash or an empty response. Return either the best partial answer it has plus a note that it was cut short, or a clear error like "reached max steps without finishing." A good pattern is to warn the agent when it's near the limit so it can summarize and finish gracefully instead of being killed mid-thought.