In plain English
An AI agent is an LLM in a loop with tools — it can reason, act, check results, and repeat until a goal is met. That sounds great, and for the right tasks it is. But "agent" has become the default answer to every AI question, even when a one-line prompt or a deterministic workflow would do the job faster, cheaper, and with fewer surprises.
Think of it like hiring contractors. A single prompt is like calling a specialist who answers one question over the phone — instant, precise, done. A fixed workflow is like a licensed plumber who follows a checklist: they show up, inspect, replace the part, test, and leave. Reliable, predictable. An AI agent is like hiring a general contractor and telling them "renovate my house" — powerful, but they make judgment calls you didn't anticipate, they sub out work you didn't know about, and every decision can ripple into another. You want a general contractor when the project is genuinely open-ended. You don't want one to change a lightbulb.
Why it matters
The cost of over-building with agents is real and often ignored until you're in production. Every extra reasoning step adds latency (often two to ten seconds per loop iteration), tokens (which map directly to API cost), and a chance for the model to take an unexpected path. A task that should cost $0.01 with a single prompt can cost $0.50 when run through an agent with five tool calls — a 50x difference that compounds fast at scale.
Reliability is the other casualty. A deterministic workflow can handle thousands of requests per minute at predictable latency and zero risk of runaway behavior. An agent introduces uncertainty at every step: the model might choose a different tool, ask an unexpected clarifying question, or confidently take the wrong branch. By 2025, enterprises that jumped straight to autonomous agents found themselves dealing with unpredictability, governance gaps, and debugging challenges that a structured workflow would have avoided entirely.
None of this means agents are bad. It means they are the right tool for a specific class of problem — and that class is narrower than the hype suggests. Knowing where the boundary is saves you weeks of engineering and months of production headaches.
The three-tier architecture
Before reaching for a decision tree, it helps to have a clear mental model of the three options you're choosing between. They differ in who decides what happens next.
Single prompt
You send a request, the model sends back a response, and you're done. The full task fits in one context window. Examples: classify a support ticket, translate a document, extract JSON from a user message, summarize an article, write a first draft. These tasks are bounded (the inputs and expected outputs are well-defined) and stateless (each request is independent).
Fixed workflow
A sequence of steps where you define the control flow in code and the LLM handles only the sub-tasks that need natural language understanding. The workflow is deterministic at the orchestration level — you know in advance which steps run and in what order — but each step may call an LLM. Examples: a document processing pipeline that chunks, embeds, retrieves, then generates; a customer onboarding flow that validates input, queries a database, then drafts a welcome email.
AI agent
The model itself decides what steps to take. Given a goal and a set of tools, it plans, executes, observes the results, and re-plans — potentially going down paths you never explicitly programmed. The agent loop (Thought → Action → Observation → repeat) is the mechanism. This is appropriate when the goal is inherently open-ended, the steps can't be predicted in advance, or the task requires adapting based on intermediate results.
The decision framework
Work through the following questions in order. Stop at the first tier that satisfies your requirements — don't jump ahead.
Question 1: Can the task fit in one LLM call?
If the input is fixed and bounded (a document, a user message, a structured object) and the desired output is a text transformation, classification, extraction, or generation task that needs no external data beyond the context window — use a single prompt. Don't add orchestration you don't need.
- Yes → single prompt. Invest in prompt quality, examples, and output format. That's your entire stack.
- No → continue to Question 2.
Question 2: Can you write the control flow as code?
If the task involves multiple steps but those steps are predictable enough that you could write them as a flowchart before the first line of code — use a fixed workflow. Workflows can still call LLMs at each step; the key is that you decide when and in what order, not the model. The rule of thumb: if you can document the logic in fewer than fifteen decision branches, a workflow probably beats an agent.
- Yes → fixed workflow. Wire steps together with code, call LLMs only where you need language understanding, and keep the orchestration deterministic.
- No → continue to Question 3.
Question 3: Is the goal genuinely open-ended?
An agent is justified when the goal cannot be broken into a fixed sequence of steps because the steps themselves depend on what the model finds along the way. Research tasks, debugging sessions, open-ended data exploration, and multi-system orchestration where the model needs to decide which system to consult and when — these are the natural home for agents.
- Yes → agent. Define the goal clearly, give the model the minimum set of tools it needs, and add guardrails (budget limits, step counts, human-in-the-loop checkpoints) to manage unpredictability.
- No → revisit. If the goal isn't open-ended, you probably can write the control flow. Push back up to Question 2.
| Trait | Single prompt | Fixed workflow | AI agent |
|---|---|---|---|
| Steps known in advance | Yes (one step) | Yes (N steps) | No |
| Model drives control flow | No | No | Yes |
| Tool calls needed | No | Sometimes | Yes |
| Latency | Lowest | Medium | Highest |
| Cost per task | Lowest | Medium | Highest |
| Predictability | High | High | Low–Medium |
| Good for | Transformations, extraction, generation | Pipelines, multi-step processes | Open-ended goals, research, debugging |
Common mistakes and when NOT to build an agent
Most premature agent builds share a small set of failure modes. Recognizing them early is the fastest way to avoid expensive rework.
Mistake 1: The task is just a bad prompt
The most common reason builders reach for an agent is that their single prompt isn't doing a good job. The real fix is almost always prompt engineering: clearer instructions, better examples (few-shot), a more explicit output format, or chain-of-thought reasoning in a single call. An agent does not fix a bad prompt — it amplifies it.
Mistake 2: The workflow is just not written yet
If you're reaching for an agent because wiring up a multi-step pipeline feels like a lot of code, that's an engineering effort problem, not a task-type problem. A well-structured workflow — even a verbose one — is almost always more reliable, cheaper to debug, and easier to monitor than an agent doing the same thing. The agent abstraction trades explainability for flexibility; don't pay that price if you don't need the flexibility.
Mistake 3: The goal is vague, not open-ended
"Make my code better" and "improve the onboarding flow" are vague goals, not open-ended ones. An open-ended goal has genuine uncertainty about which steps are needed. A vague goal just has insufficient specification — clarifying it usually reveals a fixed workflow or even a single prompt. Handing a vague goal to an agent produces unpredictable results because the model has to invent the definition of success.
Mistake 4: Real-time latency requirements
If your use case demands sub-second response times — a live search box, a typing assistant, a real-time translation overlay — an agent loop is the wrong architecture. Each reasoning step can add two to ten seconds. Design for latency first; introduce agent loops only where the user can tolerate them (background tasks, async jobs, notifications).
Going deeper
Once you've confirmed an agent is the right choice, several additional dimensions shape the design.
Hybrid architectures
The cleanest production architectures typically combine tiers. A fixed workflow handles the predictable parts — trigger, data enrichment, routing, result delivery — and calls an agent only for the one sub-step that requires open-ended reasoning. This keeps costs and latency controlled while preserving the agent's flexibility where it's actually needed. Think of the agent as a component inside a workflow, not a replacement for it.
Cost and token budgeting
A support ticket workflow that uses a 500-token system prompt, retrieves 2,500 tokens of context, and generates a 400-token response costs roughly 3,400 tokens per request. The same task done as a five-step agent might consume 15,000–30,000 tokens including all intermediate tool calls and re-reasoning steps. At current mid-tier model pricing (roughly $3–$5 per million input tokens), that's a per-task cost difference of an order of magnitude. Always prototype with token counting before committing to an agent architecture in production.
Guardrails for agents in production
When you do build an agent, unpredictability must be managed explicitly. Effective production guardrails include: a maximum step count (kill the loop after N iterations), a token budget (abort if context exceeds a threshold), human-in-the-loop checkpoints for irreversible actions (sending emails, writing to databases, making payments), and structured logging of every tool call so failures are debuggable. Agentic autonomy without guardrails is not a feature — it's a support ticket.
When multi-agent becomes the right answer
Even within the agent tier, the choice between a single agent and a multi-agent system follows the same logic: start single, scale to multi only when a single agent can't hold the full context, when subtasks benefit from specialization, or when parallel execution meaningfully reduces wall-clock time. Microsoft's Cloud Adoption Framework recommends prototyping single-agent first and moving to multi-agent only when the single-agent prototype hits a concrete ceiling.
FAQ
Can't I just always use an agent and let it figure out the simplest approach?
You can, but you'll pay for the overhead every time. An agent loop adds latency (seconds per iteration), tokens (which cost money), and unpredictability. A single prompt for a classification task costs microseconds and fractions of a cent. Running the same task as an agent easily multiplies cost by 10–50x. Match the tool to the task.
What's the difference between a workflow and an agentic pipeline?
The terms overlap. An agentic pipeline usually means a multi-step workflow where each step involves an LLM call, but the control flow is defined in code by the developer — not by the model. Some people use "agentic workflow" to mean the same thing. What matters is the question: does the model decide which step runs next, or does your code? If it's your code, it's a workflow regardless of the label.
My task requires calling an external API. Does that automatically mean I need an agent?
No. Tool use is a feature of agents, but you can also call an API directly in a fixed workflow without giving the model dynamic control over when or whether to call it. If the API call is always the same step in the same sequence, it belongs in a workflow. Only hand the API call to an agent if the model needs to decide whether to call it, which endpoint to use, or how to interpret the result to decide what to do next.
How do I know if my use case is truly open-ended?
Ask: can I write a complete flowchart before I see a single real input? If yes, the task is fixed enough for a workflow. A genuinely open-ended task has steps that depend on what the model discovers during execution — like a research task where the follow-up questions emerge from the answers, or a debugging session where the next tool to call depends on the error message.
What if I'm not sure? Should I default to an agent or a workflow?
Default to the simpler option and upgrade only when you hit a concrete limitation. Build the prompt. If that fails, build the workflow. If the workflow's control flow becomes impossible to specify in advance, then build the agent. Upgrading is straightforward; over-engineered agent systems are expensive to simplify after the fact.
Are there tasks where agents are clearly the right answer from the start?
Yes. Software debugging sessions (the next step depends on the error), open-ended research and synthesis (you don't know which sources to query until you see the initial results), and complex multi-system orchestration where the model must dynamically choose which system to interact with all benefit from agents. The common thread is that the task has genuine decision branches that can't be enumerated in advance.