Overview
Plano (formerly Arch Gateway) is an out-of-process proxy server and data plane for agentic applications. Built on Envoy by its core contributors, it moves the repetitive plumbing of production agents - routing between agents, guardrail and moderation hooks, LLM access, and observability - out of your framework and into a separate data plane you configure with YAML.
It is aimed at teams who can build an agent demo quickly but struggle to ship it safely and repeatably. Instead of writing intent classifiers, model fallbacks, provider adapters, and tracing glue in every codebase, you declare your agents and model providers once and let Plano handle the wiring. Your agents stay as plain HTTP services in any language or framework.
As an LLM gateway and proxy, Plano sits between your services and the models. It can route by model name, by alias, or automatically by preference, capture OTEL traces and metrics with no extra code, and apply moderation and memory policies through filter chains.
What it does
- Low-latency orchestration between agents - add new agents without changing application code
- Smart LLM routing: route by model name, semantic alias, or automatically by preference
- Zero-code capture of agentic signals plus OpenTelemetry traces and metrics across every agent
- Guardrail filter chains for jailbreak protection, moderation policies, and memory consistency
- Unified, OpenAI-compatible access to multiple LLM providers (OpenAI, Anthropic, and more)
- Built on Envoy as a separate out-of-process data plane, so it works with any language or framework
Getting started
Plano runs as a separate proxy configured by a YAML file. You declare your agents and model providers, run your agents as plain HTTP services, then start Plano and query it.
Install Plano and set up your environment
Follow the prerequisites and quickstart guide in the docs to install the Plano CLI and configure access. The README points to docs.planoai.dev for the exact install steps.
Define your agents in YAML
Declare agent URLs, model providers, and an agent listener. Plano handles routing, fallbacks, and tracing from this config.
# config.yaml
version: v0.3.0
agents:
- id: weather_agent
url: http://localhost:10510
- id: flight_agent
url: http://localhost:10520
model_providers:
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
default: true
- model: anthropic/claude-3-5-sonnet
access_key: $ANTHROPIC_API_KEY
listeners:
- type: agent
name: travel_assistant
port: 8001
router: plano_orchestrator_v1
agents:
- id: weather_agent
description: |
Gets real-time weather and forecasts for any city worldwide.
- id: flight_agent
description: |
Searches flights between airports with live status and schedules.
tracing:
random_sampling: 100Start Plano and query your agents
Start the proxy with your config file, then send an OpenAI-compatible chat completion request. Plano routes the request to the right agent.
# Start Plano
planoai up config.yaml
# Query - Plano routes to the right agent
curl http://localhost:8001/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o"}'Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Routing a single conversation across multiple specialized agents (for example a weather agent and a flight agent) without hard-coding routing logic
- Giving services unified, OpenAI-compatible access to several LLM providers with automatic model fallback
- Adding jailbreak protection, moderation, and memory policies to an agentic app through filter chains instead of bespoke code
- Capturing traces, metrics, and agentic signals across all agents for evaluation and continuous improvement
How Plano (formerly Arch Gateway) compares
Plano (formerly Arch Gateway) alongside other open-source gateways & routing tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| LiteLLM | ★ 50.9k | A Python SDK and proxy server that gives one OpenAI-compatible API to 100+ LLM providers, with cost tracking, budgets, fallbacks, rate limiting, and an admin UI. |
| Apache APISIX | ★ 16.8k | A cloud-native API gateway whose AI plugins add multi-provider LLM proxying, load balancing, retries and fallbacks, token-based rate limiting, and content moderation. |
| Portkey AI Gateway | ★ 12.1k | An LLM gateway that routes calls to 100+ providers through one API and adds logging, tracing, caching, and fallbacks for production AI traffic. |
| Higress | ★ 8.7k | An AI-native API gateway built on Istio and Envoy that proxies and governs traffic to many LLM providers, with token rate limiting, caching, and MCP server hosting. |
| Plano (formerly Arch Gateway) | ★ 6.6k | An Envoy-based proxy and data plane for agentic apps |
| Bifrost | ★ 5.9k | A high-throughput LLM gateway written in Go that gives a single OpenAI-compatible API to many providers, with failover, load balancing, semantic caching, and very low overhead at high request rates. |
| RouteLLM | ★ 5k | A framework from LMSYS for serving and evaluating LLM routers that sends easy queries to cheaper models and hard ones to stronger models to cut cost. |
| vLLM Semantic Router | ★ 4.5k | An intelligent router that inspects each request and sends it to the most suitable model in a mixture-of-models setup across cloud, data center, and edge. |