Overview
LiteLLM is an open-source AI gateway that gives you a single, unified interface to call 100+ LLM providers, including OpenAI, Anthropic, Gemini, Bedrock, and Azure, all using the OpenAI request format. Instead of juggling a different SDK, auth pattern, and error type for every model, you write your code once and switch providers by changing the model name.
You can use it two ways. As a Python SDK, you import the `completion` function and call any model directly from your application. As an AI Gateway (proxy server), you deploy it as a central service that your whole team points at, with virtual keys, spend tracking, and load balancing handled in one place.
As an LLM gateway, LiteLLM sits between your apps and the model providers. It fits teams that want to standardize how they reach models, keep client code OpenAI-compatible, and add cost tracking, budgets, and fallbacks without rewriting each integration.
What it does
- One unified API for 100+ LLMs, so you avoid provider-specific SDKs
- Drop-in OpenAI compatibility — swap providers by changing the model string, not your code
- Proxy server (AI Gateway) with virtual keys, spend tracking, guardrails, and load balancing
- Admin dashboard for managing keys and monitoring usage out of the box
- Supports many endpoint types: chat/completions, responses, embeddings, images, audio, batches, and rerank
- Can invoke A2A agents (LangGraph, Vertex AI Agent Engine, Bedrock AgentCore, Pydantic AI) via SDK or gateway
Getting started
LiteLLM works as a Python SDK for direct calls or as a proxy server for your whole team. Pick one to start.
Install the Python SDK
Add LiteLLM to your project.
uv add litellmMake your first call
Set the provider API keys you need, then call any model with the OpenAI-format `completion` function. Switch providers by changing the model string.
from litellm import completion
import os
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"
# OpenAI
response = completion(model="openai/gpt-4o", messages=[{"role": "user", "content": "Hello!"}])
# Anthropic
response = completion(model="anthropic/claude-sonnet-4-20250514", messages=[{"role": "user", "content": "Hello!"}])Or run the AI Gateway (proxy server)
Install the proxy extra and start it pointed at a model. It serves an OpenAI-compatible endpoint on port 4000.
uv tool install 'litellm[proxy]'
litellm --model gpt-4oCall the gateway with the OpenAI client
Point any OpenAI client at the local proxy base URL to route requests through LiteLLM.
import openai
client = openai.OpenAI(api_key="anything", base_url="http://0.0.0.0:4000")
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)Commands and code are distilled from the project's own documentation — always check the official repo for the latest.
When to use it
- Build an app that can switch between OpenAI, Anthropic, and Gemini without rewriting client code
- Run a central gateway so a team shares one endpoint with virtual keys and per-team spend tracking
- Add fallbacks and load balancing across providers to keep requests flowing when one model is down
- Standardize calls to many endpoint types (chat, embeddings, images, audio, rerank) behind one OpenAI-format API
How LiteLLM compares
LiteLLM alongside other open-source gateways & routing tools AI/TLDR tracks, ranked by GitHub stars.
| Tool | Stars | What it does |
|---|---|---|
| LiteLLM | ★ 50.9k | Call 100+ LLM providers through one OpenAI-compatible API |
| Apache APISIX | ★ 16.8k | A cloud-native API gateway whose AI plugins add multi-provider LLM proxying, load balancing, retries and fallbacks, token-based rate limiting, and content moderation. |
| Portkey AI Gateway | ★ 12.1k | An LLM gateway that routes calls to 100+ providers through one API and adds logging, tracing, caching, and fallbacks for production AI traffic. |
| Higress | ★ 8.7k | An AI-native API gateway built on Istio and Envoy that proxies and governs traffic to many LLM providers, with token rate limiting, caching, and MCP server hosting. |
| Plano (formerly Arch Gateway) | ★ 6.6k | An Envoy-based proxy and data plane for agentic apps that handles prompt routing between agents, guardrails, unified access to LLMs, and observability. |
| Bifrost | ★ 5.9k | A high-throughput LLM gateway written in Go that gives a single OpenAI-compatible API to many providers, with failover, load balancing, semantic caching, and very low overhead at high request rates. |
| RouteLLM | ★ 5k | A framework from LMSYS for serving and evaluating LLM routers that sends easy queries to cheaper models and hard ones to stronger models to cut cost. |
| vLLM Semantic Router | ★ 4.5k | An intelligent router that inspects each request and sends it to the most suitable model in a mixture-of-models setup across cloud, data center, and edge. |