AI/TLDR

What Is AutoGen? Microsoft's Conversation-Driven Agents

Understand AutoGen's core idea — agents that solve problems by talking to each other — and when conversation beats rigid orchestration.

BEGINNER10 MIN READUPDATED 2026-06-12

In plain English

AutoGen is a Microsoft Research open-source framework built around one deceptively simple idea: the best way to get AI agents to solve hard problems is to let them talk to each other. Instead of defining a fixed flowchart of steps, you create a group of agents, each with a role and a set of skills, and let them run a back-and-forth conversation until the task is done.

A concrete analogy: imagine sending a coding problem to a small Slack channel. One person writes a solution, a second person reviews it and spots a bug, the first person revises and posts a corrected version, and a third person runs the tests to confirm it passes. Nobody needed a project manager spelling out every step — the conversation itself was the workflow. AutoGen builds exactly that dynamic, but the channel participants are LLM-powered agents and the 'running tests' step is real code execution inside a sandboxed Python process.

The framework was introduced in a 2023 Microsoft Research paper, AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation, and has since gone through a major rewrite (v0.4, then 1.0 GA in February 2026) that shifted its architecture from synchronous exchanges to a fully asynchronous, event-driven messaging layer. The high-level conversation API — called AgentChat — remains the entry point for most builders.

Why it matters

Most problems that feel 'easy for an expert human team' are surprisingly hard for a single LLM call. Writing, checking, and fixing a piece of code involves at least three distinct cognitive modes: generation, critique, and verification. Cramming all three into one prompt produces a model that has to context-switch internally, and it often skips the critique step entirely.

AutoGen's conversation model solves this by separating concerns into agents. A AssistantAgent writes code; a UserProxyAgent (or a dedicated executor agent) actually runs it and returns the stdout/stderr; the assistant reads the output and iterates. Each agent only needs to be good at its own job, and the conversation history serves as shared working memory. The result is that AutoGen agents catch bugs they introduced, self-correct, and produce working code at much higher rates than a single-shot prompt.

Beyond coding tasks, the conversation model applies anywhere critique improves quality: data analysis pipelines, research summarisation, content generation with a reviewer, and multi-step tool use where one agent's output becomes another agent's input. Empirical studies in the original paper showed strong results across mathematics, operations research, online decision-making, and software engineering benchmarks.

When to pick AutoGen over alternatives

ScenarioGood fit?
Agents that generate and run code iterativelyYes — first-class built-in support
Dynamic, LLM-directed turn order in a groupYes — GroupChat with LLM speaker selection
Fixed, deterministic pipelines with precise state controlBetter fit for LangGraph
Role-based task delegation with a manager/crew feelBetter fit for CrewAI
Research prototyping and academic experimentsYes — active research community

How it works

AutoGen's AgentChat API is built on three layers. The Core layer is a low-level actor framework: each agent is an async actor that receives messages, does work, and emits messages. The AgentChat layer wraps Core with higher-level primitives — AssistantAgent, UserProxyAgent, GroupChat, termination conditions — that cover 90% of use cases without touching the actor internals. A third AutoGen Studio layer provides a no-code GUI for prototyping teams visually.

The two core agent types

In classic AutoGen (and still in AgentChat 1.0), there are two workhorses. An AssistantAgent wraps an LLM: it reads the conversation history, thinks, and produces a reply — either natural language or a code block. A UserProxyAgent acts as a proxy for a human (or for automated feedback): when it receives a code block in a message, it executes that code and sends the output back as its next reply. The conversation loops automatically until a stopping condition is hit.

Group chat and speaker selection

For tasks requiring more than two agents, AutoGen provides GroupChat (v0.2 style) or the team primitives in AgentChat (RoundRobinGroupChat, SelectorGroupChat). A GroupChatManager or SelectorGroupChat orchestrates the conversation: after each turn it either rotates agents in round-robin order, or calls the LLM with a role-play prompt to select the most appropriate next speaker given the conversation so far. All agents share the same conversation context, so every participant can see what every other participant said.

Termination conditions

A key design question in any multi-agent loop is: when do we stop? AutoGen provides composable termination conditions that can be combined with | (OR) or & (AND): MaxMessageTermination stops after N messages, TextMentionTermination stops when an agent says a specific string like "TERMINATE", TokenUsageTermination stops at a token budget, and HandoffTermination stops when an agent hands off to a human. This makes it easy to set a safety ceiling while still allowing the agents to self-terminate when they decide the task is done.

Minimal two-agent coding loop (AgentChat 1.0 API)python
import asyncio
from autogen_agentchat.agents import AssistantAgent, CodeExecutorAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_ext.code_executors.local import LocalCommandLineCodeExecutor

async def main():
    model_client = OpenAIChatCompletionClient(model="gpt-4o")
    executor = LocalCommandLineCodeExecutor(work_dir="/tmp/coding")

    assistant = AssistantAgent(
        name="assistant",
        model_client=model_client,
        system_message="Write Python code to solve tasks. Say TERMINATE when done.",
    )
    code_runner = CodeExecutorAgent(
        name="code_runner",
        code_executor=executor,
    )

    team = RoundRobinGroupChat(
        participants=[assistant, code_runner],
        termination_condition=TextMentionTermination("TERMINATE"),
    )

    result = await team.run(task="Write a script that prints the first 10 Fibonacci numbers.")
    print(result.messages[-1].content)

asyncio.run(main())

Group chats with dynamic speaker selection

The SelectorGroupChat pattern is what makes AutoGen genuinely different from a simple pipeline. Instead of a fixed order, the selector — itself an LLM call — reads the conversation so far and decides which agent should speak next. This means that if the Researcher agent realizes it needs more data, the conversation can naturally loop back to it instead of blindly advancing to the Writer.

Here is a typical three-agent research team wired with SelectorGroupChat. The selector_prompt is a role-play prompt you supply: it describes what each agent is good at and asks the LLM to pick the best next speaker. You can also supply allow_repeated_speaker=True if you want the same agent to take multiple turns in a row when it makes sense.

Three-agent selector group chatpython
from autogen_agentchat.teams import SelectorGroupChat
from autogen_agentchat.conditions import MaxMessageTermination

researcher = AssistantAgent(
    name="Researcher",
    model_client=model_client,
    system_message="You search the web and gather facts. Return findings as bullet points.",
)
writer = AssistantAgent(
    name="Writer",
    model_client=model_client,
    system_message="You synthesize research into a concise report. Write in plain English.",
)
critic = AssistantAgent(
    name="Critic",
    model_client=model_client,
    system_message="You review drafts for factual errors and clarity. Say TERMINATE when satisfied.",
)

team = SelectorGroupChat(
    participants=[researcher, writer, critic],
    model_client=model_client,           # LLM that picks the next speaker
    termination_condition=MaxMessageTermination(20),
)

result = await team.run(task="Summarise the key papers on multi-agent LLM systems from 2023-2025.")

The group chat pattern excels at open-ended tasks where the workflow is hard to specify in advance. The tradeoff is non-determinism: two runs of the same task can follow different speaker orders. If reproducibility is critical — for example in a production billing pipeline — RoundRobinGroupChat or a graph-based framework like LangGraph gives you more control.

AutoGen vs. CrewAI and LangGraph

All three frameworks let you build multi-agent systems, but they start from different mental models. Understanding that difference is the fastest way to pick the right one for your project.

The rule of thumb: if your workflow is emergent (you don't know the exact steps until the agents start working), AutoGen's conversation model wins. If your workflow is known in advance and you need deterministic state transitions that you can replay and audit, LangGraph is the stronger choice. CrewAI sits in the middle — it's structured enough to be readable, dynamic enough to handle multi-step delegation without a graph.

Going deeper

The Core API and async actors

AgentChat covers most use cases, but AutoGen's Core API gives you direct access to the async actor model underneath. In Core, every agent is a class that implements a handle method for each message type it understands. Agents run in an AgentRuntime — either SingleThreadedAgentRuntime for local scripts or a distributed runtime for cross-process or cross-language deployments. This is how you'd build, say, an agent that subscribes to a Kafka topic instead of waiting for an inline team.run() call.

The v0.4/1.0 architecture introduced cross-language support: a Python agent and a .NET agent can participate in the same conversation through the distributed runtime. The message protocol is language-agnostic. This is useful in enterprise settings where existing services are written in different languages and you want to wrap them as first-class AutoGen agents rather than shelling out to external processes.

Human-in-the-loop patterns

AutoGen supports pausing an agent run and surfacing a question to a real human via HandoffTermination combined with a UserProxyAgent. The team stops cleanly, the calling code receives a HandoffMessage, the application prompts the user, and the conversation resumes with the user's reply appended to the history. This makes it possible to build workflows where the agents handle everything they can autonomously but escalate gracefully when they hit uncertainty.

Observability and AutoGen Studio

Long-running multi-agent conversations are notoriously hard to debug. AutoGen exposes an event stream on every team run: you can async for event in team.run_stream(task=...) and inspect every agent message, token count, and tool call in real time. AutoGen Studio (v0.4+) provides a local web UI where you can visually assemble agent teams, define tools, and step through conversation traces — useful for prototyping before committing to code.

The road to Microsoft Agent Framework

In October 2025 Microsoft launched Microsoft Agent Framework (public preview), which merges AutoGen's dynamic conversation model with Semantic Kernel's enterprise features: session-based state management, type safety, middleware, and telemetry. For new production projects, Microsoft recommends Agent Framework. AutoGen's packages (autogen-agentchat, autogen-core, autogen-ext) remain in maintenance — bug fixes and security patches, but no new features. Existing AutoGen code continues to run; migration guides exist for moving to Agent Framework when you're ready.

FAQ

What is the difference between AutoGen v0.2 and AutoGen 1.0?

v0.2 (the original, installed as pyautogen) used a synchronous, callback-based architecture and a single ConversableAgent class. Version 1.0 (installed as autogen-agentchat + autogen-core) is a ground-up async rewrite with an actor-based runtime, streaming support, cross-language agents, and composable termination conditions. The APIs are incompatible — v0.2 code needs migration.

Does AutoGen require OpenAI? Can I use it with Claude or local models?

AutoGen is model-agnostic. The autogen-ext package ships clients for OpenAI, Azure OpenAI, Anthropic (Claude), and any provider with an OpenAI-compatible endpoint. You can also implement the ChatCompletionClient interface for other providers.

Is AutoGen safe to use for code execution in production?

AutoGen's LocalCommandLineCodeExecutor runs code directly on the host machine — fine for local experiments, dangerous in production. For safe deployments, use the DockerCommandLineCodeExecutor (runs code inside a Docker container) or a cloud sandbox like E2B. Always sandbox agent-generated code before running it on infrastructure you care about.

What does 'GroupChat' mean in AutoGen and how does speaker selection work?

A GroupChat adds a third agent — the GroupChatManager or SelectorGroupChat — that sits above the conversation and decides who speaks next. In the selector mode, the manager calls an LLM with a role-play prompt describing each agent's specialty and picks the most appropriate next speaker. In round-robin mode it simply cycles through agents in order.

Should I start new projects with AutoGen or Microsoft Agent Framework?

For learning, prototyping, and research, AutoGen 1.0 (autogen-agentchat) is still a great starting point — it's simpler, has excellent documentation, and maps directly onto the academic papers. For production systems that need session persistence, telemetry, and enterprise support, Microsoft recommends starting with Microsoft Agent Framework, which builds on the same Core layer.

How does AutoGen compare to CrewAI for building multi-agent pipelines?

AutoGen is conversation-driven — the workflow emerges from message exchanges, including dynamic speaker selection by an LLM. CrewAI is role-driven — you define agents with job titles and explicit tasks, and a manager delegates them in order. AutoGen is more flexible for open-ended tasks and iterative coding; CrewAI is more readable when you know the workflow upfront.

Further reading