In plain English
A multi-agent system is a setup where several AI agents work on one job together, each handling a piece of it, instead of a single agent doing everything alone. One agent might do research, another might write, a third might check the work — and something coordinates them so the pieces add up to a finished result.
Think of a small newsroom. A plain LLM is one reporter answering one question. A single AI agent is that reporter who can also make phone calls, pull records, and file a story on their own. A multi-agent system is the whole newsroom: an editor assigns angles, several reporters chase them in parallel, a fact-checker verifies the claims, and the editor stitches the final piece together. No single person does all of it — and the editor's main job is coordination, not writing.
The key idea is division of labor. Each agent gets its own instructions, its own tools, and often its own slice of the work. They talk to each other — usually by passing messages or results back and forth — and one of them (or a piece of orchestration code) decides who does what and when. That's the whole concept. Everything else is detail about how they divide and coordinate.
Why it matters
Single agents hit walls on big jobs. Cram fifty tools and a ten-page instruction set into one agent and it gets confused — it forgets which tool to reach for, mixes up unrelated steps, and its context window fills with noise from every subtask at once. Splitting the work lets each agent stay focused on a small, clear role, which is far more reliable than one agent juggling everything.
There are three concrete payoffs. Specialization: a 'researcher' agent can have a tight prompt and just web-search tools, while a 'coder' agent has file and shell tools — neither is distracted by the other's job. Parallelism: if a task has five independent parts, five agents can work at once instead of one agent plodding through them in sequence. Isolation: each agent has its own fresh context, so a researcher reading twenty messy web pages doesn't pollute the writer's clean workspace.
Who should care
- Developers whose single agent has grown too many tools, too long a prompt, or fails on multi-part tasks — splitting it up is often the fix.
- Product teams building 'deep research' or 'autonomous workflow' features, where fanning out across sub-tasks is the whole point.
- Anyone evaluating agent frameworks — most modern agent frameworks now have first-class support for subagents and orchestration, and knowing the pattern helps you read their docs.
How it works
The most common shape is the orchestrator-worker pattern (also called coordinator-subagent or manager-worker). One orchestrator agent owns the overall goal. It breaks the goal into sub-tasks, hands each to a worker agent, collects their results, and combines them into the final answer. The orchestrator plans and delegates; the workers execute.
Each worker is a full agent in its own right — an LLM running its own agent loop, with its own tools and its own private context. The orchestrator doesn't see how a worker reached its answer; it only sees the answer the worker hands back. That isolation is a feature: it keeps each agent's context small and focused. It's also the source of the pattern's biggest headache — the orchestrator has to explicitly tell each worker everything it needs, because the worker can't peek at what the orchestrator (or its siblings) already figured out.
A typical run, step by step
Workers can run in parallel (the orchestrator fires off several at once and waits for all to finish — good for independent sub-tasks) or in sequence (one worker's output feeds the next — needed when there are dependencies). A research orchestrator might spawn five searchers in parallel; a coding orchestrator might run 'write code' then 'run tests' in order. Good agent planning is what decides which sub-tasks are independent and can be fanned out.
How do agents actually communicate? Almost always by passing structured messages or text results, not by sharing memory. The orchestrator's delegation is a prompt to the worker; the worker's reply is text (or structured output) back to the orchestrator. Under the hood this is the same machinery as any agent: tool use and function calling. In fact, the cleanest way to build one is to make 'spawn a subagent' just another tool the orchestrator can call.
Single agent vs. multi-agent
The honest default is start with one agent. A single agent with good tools handles a surprising amount, and it's far easier to debug. Reach for multiple agents only when one agent is clearly straining.
- One loop, one context
- Cheaper, fewer tokens
- Easy to trace and debug
- Best for focused tasks
- Many loops, isolated contexts
- More calls, more cost
- Harder to trace failures
- Best for broad, parallel tasks
Signs you've outgrown a single agent: the prompt has bloated past what the model reliably follows; the agent reaches for the wrong tool because it has too many; the task has clearly separable parts that could run in parallel; or one long context is mixing unrelated concerns and degrading quality. If none of those bite, a single agent is the simpler, cheaper, more reliable choice.
Build a tiny orchestrator
Here's the whole idea in about 30 lines of Python using the Anthropic SDK. An orchestrator delegates two independent research questions to two worker agents, then merges their answers. The trick is that 'spawn a worker' is just a normal function — the orchestrator agent calls it like any other tool.
import anthropic
client = anthropic.Anthropic(api_key="sk-...") # your key here
def worker(role: str, task: str) -> str:
"""One worker agent: a focused prompt, its own fresh context."""
resp = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
system=f"You are a {role}. Answer ONLY the task you are given, concisely.",
messages=[{"role": "user", "content": task}],
)
return resp.content[-1].text
# Orchestrator: split the goal, delegate in turn, then merge the results.
# Each call to worker() is an isolated agent with its own context window.
subtasks = [
("market researcher", "List 3 risks of building on a single cloud provider."),
("market researcher", "List 3 benefits of a multi-cloud strategy."),
]
results = [worker(role, task) for role, task in subtasks]
# The orchestrator's final step: synthesize the workers' answers.
summary = client.messages.create(
model="claude-opus-4-8",
max_tokens=1024,
system="You are an orchestrator. Merge the worker notes into one clear briefing.",
messages=[{
"role": "user",
"content": "Worker notes:\n\n" + "\n\n---\n\n".join(results),
}],
)
print(summary.content[-1].text)Notice what makes this multi-agent: each worker() call is its own LLM with its own system prompt and a clean context — the researchers never see each other's notes, and the orchestrator only ever sees their final answers. To run the workers truly in parallel, you'd wrap the calls in threads or asyncio instead of a list comprehension. A real framework adds retries, parallel execution, and message routing on top, but this is the core.
Where multi-agent shows up
You're likely already using multi-agent systems without the label. A few concrete examples:
| System | Orchestrator role | Worker roles |
|---|---|---|
| Deep research | Plans sub-questions, merges a report | Parallel searchers, one per angle |
| Coding agent | Owns the task, reviews the result | Coder, test-runner, reviewer subagents |
| Customer support | Routes the ticket, owns the reply | Billing agent, account agent, policy agent |
| Document pipeline | Splits the doc, assembles output | Extractor, summarizer, fact-checker |
These patterns lean on ideas you've met elsewhere. Routing a request to the right specialist agent is the same idea as a router in a plain workflow. Workers that look things up before answering are doing retrieval-augmented generation inside their own loop. And when an orchestrator and its workers need to share tools or connect to outside services, the wiring is increasingly standardized through MCP (the Model Context Protocol), so a tool built once works for every agent in the system.
A close cousin worth naming: computer-use agents, where an agent drives a real screen — clicking, typing, reading pixels. These are often orchestrated by a higher-level agent that plans the task and hands individual screen actions to the computer-use worker. That's the other half of this subcategory, and it's a high-autonomy frontier with its own safety concerns.
Going deeper
Once the orchestrator-worker shape clicks, the genuinely hard parts of multi-agent systems come into view. None of these are solved — they're the active frontier.
Coordination and context sharing
The central tension: isolated contexts keep each agent focused but mean agents can't see each other's reasoning. If the orchestrator delegates vaguely, a worker guesses wrong and the whole result is off. The fix is detailed delegation — the orchestrator must spell out objective, scope, and what 'done' looks like for each worker, since the worker has no shared memory to fall back on. Some systems add a shared scratchpad or a vector database as common memory, but that reintroduces the coupling that isolation was meant to remove.
Topologies beyond orchestrator-worker
Orchestrator-worker is the workhorse, but not the only shape. Sequential pipelines chain agents in a fixed order (extract then summarize then verify). Debate / critic setups have one agent produce and another challenge, iterating toward a better answer — the LLM-as-a-judge idea applied to agents. Peer / network designs let any agent message any other, which is powerful but hard to keep stable. Most production systems stick to the hierarchical orchestrator-worker shape precisely because freer topologies are harder to control and debug.
Reliability and cost
Errors compound across agents the same way they compound across steps — if each worker is 90% reliable, a five-worker chain is far less so, and a confused worker can mislead the orchestrator. Multi-agent runs also fan out token usage dramatically, so observability and tracing matter even more here: when a 12-agent run produces a wrong answer, you need to see exactly which agent went off the rails. Treat cost as a first-class design constraint, not an afterthought.
Evaluation and the build-it-yourself question
Because behavior is non-deterministic and spread across agents, you can't unit-test a multi-agent system like normal code — you build evals that score whether the system reached the goal across many runs. On the build side, you can hand-roll the orchestrator (as above) or use an agent framework that bundles subagent spawning, parallel execution, and message routing. The widely-shared lesson: start with one agent, move to multi-agent only when you've felt a single agent fail, and even then keep the topology as simple as the task allows.
FAQ
What is a multi-agent system in AI?
It's a setup where several AI agents work on one job together, each with its own role, tools, and context, coordinated so the pieces add up to a finished result. Typically one orchestrator agent plans and delegates sub-tasks to worker agents, then merges what they return. If you only have one agent in a loop, it's a single-agent system, no matter how many tools it has.
When should I use multiple AI agents instead of one?
Use multiple agents when one agent is clearly straining — its prompt has bloated, it picks the wrong tool because it has too many, or the task has independent parts that could run in parallel (like searching many sources at once). For focused, tightly-coupled tasks, a single agent is cheaper, easier to debug, and usually more reliable. Start with one and split only when you feel it fail.
What is the difference between single-agent and multi-agent systems?
A single-agent system is one LLM in one loop with one shared context. A multi-agent system runs several agents, each in its own loop with its own isolated context, coordinated by an orchestrator. Multi-agent buys specialization, parallelism, and context isolation, but costs more tokens and is harder to trace when something goes wrong.
What is an orchestrator-worker pattern?
It's the most common multi-agent shape: one orchestrator (or coordinator) agent owns the overall goal, breaks it into sub-tasks, delegates each to a worker agent, collects the results, and combines them into the final answer. Workers run their own agent loops with their own tools and private context; the orchestrator only sees their returned answers, not how they got there.
Are multi-agent systems more expensive than a single agent?
Usually yes. Each agent is its own set of model calls, so a multi-agent run can use 10–20x the tokens of a single agent on the same task. You're trading tokens for quality and speed on fan-out work. Measure both cost and quality before shipping, and keep the number of workers as low as the task allows.
How do agents in a multi-agent system communicate?
Almost always by passing structured messages or text results, not by sharing memory. The orchestrator's delegation is a prompt to the worker, and the worker's reply is text or structured output back to the orchestrator. The cleanest implementation makes 'spawn a subagent' just another tool the orchestrator can call, so it's the same tool-use machinery as any agent.