Overview
Bifrost is an open-source AI gateway written in Go that puts a single, OpenAI-compatible API in front of more than 23 model providers, including OpenAI, Anthropic, AWS Bedrock, Google Vertex, Azure, Mistral, Ollama, and Groq. Your application talks to one endpoint, and Bifrost routes each request to the provider you name in the model string.
It is aimed at teams who call several LLM providers and want one place to handle keys, failover, and routing instead of wiring each SDK separately. Because the API matches the OpenAI format, it can act as a drop-in replacement for existing OpenAI or Anthropic client code with little change.
As an LLM gateway, Bifrost sits between your app and the providers. It adds automatic fallbacks, load balancing across keys and providers, semantic caching, and observability, and ships with a built-in web UI for visual configuration and monitoring.
What it does
- Single OpenAI-compatible API in front of 23+ providers (OpenAI, Anthropic, AWS Bedrock, Google Vertex, Azure, Cohere, Mistral, Ollama, Groq, and more)
- Automatic fallbacks between providers and models, plus load balancing across multiple API keys
- Semantic caching that reuses responses for similar requests to cut cost and latency
- Built-in web UI for provider configuration, real-time monitoring, and analytics
- Model Context Protocol (MCP) support so models can call external tools like filesystem and web search
- Observability with native Prometheus metrics, distributed tracing, and request logging
Getting started
Bifrost runs as an HTTP gateway you can start with npx or Docker, then call with any OpenAI-compatible client.
Start the gateway
Run Bifrost locally with npx, or use the Docker image. Both serve the gateway and the web UI on port 8080.
# Install and run locally
npx -y @maximhq/bifrost
# Or use Docker
docker run -p 8080:8080 maximhq/bifrostConfigure via the web UI
Open the built-in web interface to add providers and API keys.
# Open the built-in web interface
open http://localhost:8080Make your first API call
Send a chat completion to the gateway. Set the model as provider/model, for example openai/gpt-4o-mini.
curl -X POST http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello, Bifrost!"}]
}'Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Call several LLM providers through one OpenAI-compatible endpoint instead of integrating each SDK separately
- Add automatic failover so requests keep working when one provider or model is down
- Reduce cost and latency by serving repeated or similar prompts from the semantic cache
- Track usage and route traffic across multiple API keys with budgets and rate limits
How Bifrost compares
Bifrost alongside other open-source gateways & routing tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| LiteLLM | ★ 50.9k | A Python SDK and proxy server that gives one OpenAI-compatible API to 100+ LLM providers, with cost tracking, budgets, fallbacks, rate limiting, and an admin UI. |
| Apache APISIX | ★ 16.8k | A cloud-native API gateway whose AI plugins add multi-provider LLM proxying, load balancing, retries and fallbacks, token-based rate limiting, and content moderation. |
| Portkey AI Gateway | ★ 12.1k | An LLM gateway that routes calls to 100+ providers through one API and adds logging, tracing, caching, and fallbacks for production AI traffic. |
| Higress | ★ 8.7k | An AI-native API gateway built on Istio and Envoy that proxies and governs traffic to many LLM providers, with token rate limiting, caching, and MCP server hosting. |
| Plano (formerly Arch Gateway) | ★ 6.6k | An Envoy-based proxy and data plane for agentic apps that handles prompt routing between agents, guardrails, unified access to LLMs, and observability. |
| Bifrost | ★ 5.9k | One OpenAI-compatible API for 23+ LLM providers, with failover and caching |
| RouteLLM | ★ 5k | A framework from LMSYS for serving and evaluating LLM routers that sends easy queries to cheaper models and hard ones to stronger models to cut cost. |
| vLLM Semantic Router | ★ 4.5k | An intelligent router that inspects each request and sends it to the most suitable model in a mixture-of-models setup across cloud, data center, and edge. |