Cloudflare · 2026-04-16 · notable
Cloudflare AI Platform — Unified Inference Layer Across 70+ Models and 12 Providers
Cloudflare's AI Gateway evolves into a full AI Platform: a single inference API spanning 70+ models from 12+ providers, with automatic failover, centralized cost management, custom model support via Replicate Cog, and 330 PoPs for low-latency delivery.

Cloudflare's AI Gateway expands to a full inference platform — one API to access 70+ models from 12+ providers with automatic failover and cost controls.
Key specs
| Models | 70+ |
|---|---|
| Providers | 12+ |
| Po ps | 330 global |
What is it?
Cloudflare AI Platform is the expanded version of Cloudflare's AI Gateway, repositioned as a unified inference layer for production AI applications. A single API route and one credit pool now covers 70+ models across providers like OpenAI, Anthropic, Google, Mistral, and others. The platform handles routing, automatic failover, caching, rate limiting, and centralized spend analytics. Custom fine-tuned models can be brought in via Replicate's Cog containerization.
How does it work?
Requests are routed through Cloudflare's 330 global data centers, reducing time-to-first-token by serving inference from the PoP closest to the user. Automatic failover retries failed requests against a configurable list of backup providers or models, so a provider outage doesn't stop your application. Cost management aggregates usage across all providers into one dashboard and enforces per-application spend limits. For custom models, Replicate Cog packages a local model into a container Cloudflare can host and route to.
Why does it matter?
Most AI applications already use 3–4 providers — managing separate API keys, billing accounts, and retry logic for each is real operational overhead. A single unified interface with automatic fallback makes multi-model architectures much simpler to operate, especially for agent systems that need low-latency global access and resilience against provider outages.
Who is it for?
Developers and teams building multi-model applications or agents that need reliable, globally distributed inference across multiple providers.
Try it
https://developers.cloudflare.com/ai-gateway/