What Is the AG-UI Protocol? Agent-to-App Event Streaming

You will understand what the AG-UI protocol standardizes, how its event stream connects an agent backend to a user-facing app, and how it complements protocols like MCP and A2A.

INTERMEDIATE10 MIN READUPDATED 2026-06-14

DOCSdocs.ag-ui.com OFFICIAL SITEmodelcontextprotocol.io

In plain English

An AI agent usually runs on a server somewhere: it thinks, calls tools, updates its plan, and slowly works toward an answer. But a person is sitting in front of a screen waiting to see all of that happen — the words appearing token by token, the little "searching the web…" badge, the half-finished answer that suddenly gets corrected. Something has to carry that live activity from the backend to the screen, and describe it in a way the screen can actually draw.

AG-UI Protocol — illustration — AG-UI Protocol — i.ytimg.com

AG-UI (the Agent-User Interaction protocol) is an open, event-based standard for exactly that link: the live stream between an agent backend and a user-facing app. As the agent runs, it emits a typed stream of events — here is a chunk of text, I am calling this tool, my state changed, render this UI — and any compliant frontend knows how to read and display them. The agent speaks one well-defined language; the app listens for it.

Think of it like a sports commentator's feed. The match (the agent's reasoning) happens on the field. The commentator doesn't ship you the whole stadium — they send a steady stream of short, structured calls: "kickoff," "pass," "goal," "substitution." Anyone with a radio can follow along, because the vocabulary of calls is agreed in advance. AG-UI is that agreed vocabulary for what an agent is doing, moment to moment, so the UI can narrate it to the user.

Why it matters

If you have ever built a chat UI on top of an LLM, you know the messy part is never the first token — it is everything after. A real agent doesn't just stream text. It pauses to call a tool, shows progress, asks the user to approve an action, revises an earlier answer, and updates a shared piece of state the UI is displaying. Without a standard, every team invents its own ad-hoc JSON for all of that, and the frontend code is glued tightly to one specific backend.

AG-UI exists to break that glue. A standard event stream gives you a few concrete wins:

Mix and match. Any AG-UI-compliant agent can drive any AG-UI-compliant UI. Swap the backend from one framework to another, or reuse one chat component across several agents, without rewriting the wire format each time.
Richer than plain streaming. Raw text streaming (the kind of token-by-token output most chat APIs give you) can't express "I'm now calling the weather tool" or "update the order total in the sidebar." AG-UI has named events for tool calls, state changes, and UI updates, so the interface can show what the agent is doing, not just its prose.
Human-in-the-loop by design. Because the stream is bidirectional in spirit — the app can send the user's reply, an approval, or new input back into the run — pausing an agent to ask "are you sure?" becomes a normal part of the protocol instead of a hack.
Less boilerplate. Frontend libraries can implement the event handling once. You consume a typed stream instead of parsing bespoke server-sent blobs and guessing what each field means.

Who should care? Anyone building a product surface on top of agents — chat sidebars, copilots, dashboards that an agent updates live, or apps where the agent and the user collaborate in real time. If your agent only ever returns one final string, you don't need this. The moment the user needs to watch and steer the agent mid-run, a standard event protocol starts paying for itself.

How it works

AG-UI sits between two pieces: the agent backend that produces events, and the frontend (often through a thin client library) that consumes them and renders the UI. When a user sends a message, the backend starts a run and streams a sequence of typed events until the run finishes. The frontend updates the screen as each event arrives, and can send user input back to continue or steer the run.

// Where AG-UI sits

User in the apptypes, approves, repliesAG-UI clientfrontend libraryEvent streamtyped events both waysAgent backendruns, calls tools, thinks

The event stream

The heart of AG-UI is a stream of small, typed events. Rather than one big blob at the end, the agent emits many tiny messages as things happen. Each event has a type the frontend recognizes, so the UI always knows how to handle it. The exact event names are defined by the protocol, but conceptually they fall into a handful of families:

Event family	What it carries	What the UI does
Lifecycle	Run started / run finished, errors	Show a spinner, then settle the final state
Text	Incremental chunks of the assistant's message	Stream words into the bubble as they arrive
Tool calls	Which tool is being called, its arguments, its result	Render a "using tool X" step and its outcome
State	A snapshot or patch of shared agent state	Update a sidebar, form, or live document
UI / custom	App-specific render instructions	Draw a custom component the agent asked for

Two design choices make this practical. First, deltas: text and state usually arrive as incremental updates (a few new words, a small patch) rather than re-sending the whole thing every time, which keeps the stream cheap and the UI smooth. Second, transport-agnosticism: AG-UI describes the events, not the wire. The same event stream can ride over Server-Sent Events, WebSockets, or another channel, so you pick whatever fits your stack.

// A single run, event by event

Run startedText deltas stream inTool call + resultState patch updates the UIMore text, then run finished↺ repeat

Below is a sketch of what a frontend handler looks like in practice. You don't parse raw JSON and guess — you switch on the event type and update your interface. (Names are illustrative; the real ones come from the AG-UI spec.)

consuming an AG-UI stream (sketch)typescript

// Subscribe to the agent's event stream and react per event type.
for await (const event of agent.run({ messages })) {
  switch (event.type) {
    case "RUN_STARTED":
      ui.showThinking();
      break;

    case "TEXT_DELTA":
      // Append a few new tokens to the current message bubble.
      ui.appendText(event.delta);
      break;

    case "TOOL_CALL":
      // Show a "using <tool>" step the user can watch.
      ui.showToolStep(event.name, event.args);
      break;

    case "STATE_PATCH":
      // Apply a partial update to shared state (e.g. a live form).
      ui.applyStatePatch(event.patch);
      break;

    case "RUN_FINISHED":
      ui.settle();
      break;
  }
}

AG-UI vs MCP vs A2A

These three protocols are easy to confuse because they all standardize connections in the agent world. The clean way to tell them apart is to ask: connection between whom? Each one covers a different edge of the same system, and a serious app may use all three at once.

// Three agent protocols, three different edges

AG-UI (agent ↔ app)

Streams what the agent is doing to the user's screen
Events: text, tool calls, state, UI updates
Powers live chat and copilot interfaces
The human is the other end

MCP (agent ↔ tools)

Connects an agent to tools and data sources
Exposes tools, resources, and prompts
Lets the agent reach files, APIs, databases
A service is the other end

A2A (agent ↔ agent)

Lets agents talk to and delegate to each other
Discovery, task handoff, messaging
Powers multi-agent teamwork
Another agent is the other end

A useful mental picture: in a finished product, MCP feeds the agent its tools, A2A lets it hand work to specialist agents, and AG-UI streams the whole performance to the person watching. They don't compete; they cover different gaps. Treating AG-UI as "MCP for the frontend" is a fair first approximation, as long as you remember the other end of the wire is a human interface, not a tool server.

A concrete example

Imagine a travel-planning copilot embedded in a booking app. The user types "find me a quiet hotel near the conference for under 200 a night." Here is the same run, told as the AG-UI events the backend emits and what the frontend draws for each one.

The agent emits…	The UI shows…
Run started	A typing indicator appears in the chat
Text deltas: "Let me search hotels near the venue…"	The sentence streams in word by word
Tool call: search_hotels(area, max_price)	A "Searching hotels" step with a small spinner
Tool result: 6 matches	The step turns into "Found 6 hotels"
State patch: candidate list updated	A results panel in the sidebar fills with cards
Text deltas: "Here are three quiet options…"	The recommendation streams into the chat
UI event: render a "Book" button per hotel	Clickable booking buttons appear inline
Run finished	Spinner clears; the message is final

The important point: the frontend never had to understand travel. It only understood AG-UI event types — text, tool call, state, UI, lifecycle. Swap the travel agent for a coding agent or a support agent and the same chat component keeps working, because the contract is the event stream, not the domain. That reusability is the entire payoff of standardizing this edge.

Going deeper

Once the basic event stream clicks, a few deeper themes are worth knowing before you build on AG-UI in earnest.

Shared state and generative UI. The state events are more powerful than they first look. Because the agent and the app can share a synchronized piece of state, you can build interfaces where the agent edits a live document, fills a form, or updates a dashboard the user is also touching — both sides working on the same object. Pushed further, the UI events let an agent ask the frontend to render specific components, which is the foundation of "generative UI": the agent doesn't just produce text, it shapes the interface. The component library CopilotKit and similar tools sit on top of AG-UI to make this practical, though the exact frontend layer is your choice.

Adapters and existing backends. You rarely rewrite an agent to be "AG-UI native." In practice you wrap an existing backend — a framework run loop, or a plain LLM-plus-tools loop — with an adapter that translates its internal events into the AG-UI vocabulary. This is why the protocol can spread quickly: it meets agents where they already are, rather than demanding a rewrite. Treat AG-UI as a translation target, not a framework you must adopt wholesale.

Practical edge cases. Real streams are messy. Tool calls can fail mid-run, so plan for error events and partial results — see handling tool errors in agents. Agents may issue parallel tool calls, so your UI should associate each result with the right in-flight step rather than assuming one tool at a time. And because the stream can carry structured tool outputs, the frontend can render rich results (tables, cards) instead of just text — but only if you handle the structured shape on the client.

Where to go next: read the AG-UI docs for the exact event catalog, then look at how MCP and A2A divide the rest of the agent-plumbing landscape so you choose the right protocol for each edge of your system. The durable idea here is simple and worth keeping even if the spec evolves — standardize the live conversation between an agent and the screen, and any agent can drive any UI.

FAQ

What is the AG-UI protocol?

AG-UI (Agent-User Interaction protocol) is an open, event-based standard for the live stream between an agent backend and a user-facing app. As the agent runs, it emits typed events — text chunks, tool calls, state changes, and UI updates — that a compliant frontend renders consistently, so any AG-UI agent can drive any AG-UI interface.

What is the difference between AG-UI and MCP?

They cover different edges of an agent system. MCP connects an agent to its tools and data sources (agent-to-tools), while AG-UI connects an agent to the user-facing application (agent-to-app). A real product often uses both: MCP to give the agent capabilities, AG-UI to stream what it's doing to the screen.

How is AG-UI different from A2A?

A2A standardizes how agents talk to and delegate to other agents (agent-to-agent), powering multi-agent teamwork. AG-UI standardizes how a single agent streams its activity to a human interface (agent-to-app). One is about machine-to-machine collaboration; the other is about narrating an agent's run to a person.

Why not just use plain text streaming for an agent UI?

Plain token streaming can show prose appearing word by word, but it can't express "I'm calling the weather tool now" or "update the order total in the sidebar." AG-UI adds named events for tool calls, shared state, and UI updates, so the interface can show what the agent is actually doing, not only its final text.

Do I have to rewrite my agent to use AG-UI?

Usually no. You typically wrap an existing backend with an adapter that translates its internal events into the AG-UI vocabulary, rather than rebuilding the agent. That is a big reason the protocol can be adopted incrementally — it meets agents where they already are.

Is AG-UI a stable, production-ready standard?

It is active and emerging rather than a settled universal standard. The core ideas — a typed event stream with families for text, tool calls, state, and UI — are stable in spirit, but specific event names and the surrounding library ecosystem are still maturing, so build against the official spec and pin to it.

// In plain English

// Why it matters

// How it works

The event stream

// AG-UI vs MCP vs A2A

// A concrete example

// Going deeper

// FAQ

// Further reading

// Related

In plain English

Why it matters

How it works

AG-UI vs MCP vs A2A

A concrete example

Going deeper

FAQ

Further reading

Related