AI/TLDR

Weave · 2026-06-27 · major

Weave Router — drop-in proxy that picks the right LLM per request

Weave Router is an open-source proxy for Claude Code, Codex, and Cursor that scores each prompt with an on-box ONNX embedder and routes it to the best model across Anthropic, OpenAI, Gemini, and OpenRouter providers in under 50ms.

Weave Router GitHub repository preview card

An open-source proxy that scores every prompt and routes it to the cheapest model that can still answer it.

Key specs

GitHub stars493
Added overhead<50ms
Cost reduction40–70%

Quick facts

MakerWeave (workweave.dev)
LicenseElastic License v2
LanguageGo (83.6%)
Supported providersAnthropic, OpenAI, Gemini, plus DeepSeek/Kimi/GLM/Qwen/Llama/Mistral via OpenRouter
Installnpx @workweave/router (self-host) or hosted at router.workweave.ai
What's newShow HN launch of the open-source build on 27 Jun 2026
Free tierYes — self-host free; managed service starts free with usage-based credits

What is it?

Weave Router is a Go proxy that drops in front of Claude Code, Codex, Cursor, OpenCode, or any compatible app and re-dispatches each request to whichever Anthropic, OpenAI, Gemini, or OpenRouter model fits best. It speaks all three native API shapes, so existing agents do not need code changes — just an endpoint swap to localhost:8080 or the hosted endpoint.

How does it work?

Each incoming prompt is embedded with a small ONNX model running in-process, then compared to frozen cluster centroids derived from the Avengers-Pro routing research. The closest centroid maps to a model on the user's enabled provider list, and the call is forwarded with provider keys kept encrypted on-box. OTLP tracing is built in for per-route observability.

Why does it matter?

Routing per request rather than per session lets cheap models handle the easy turns and saves frontier capacity for the hard ones — the team reports 40–70% lower token spend at Robinhood, PostHog, and Reducto without quality drops. Because Weave Router is open source under Elastic License v2, teams can self-host the same router their hosted service runs.

Who is it for?

Engineering teams running agentic coding tools who want to lower API spend without rewriting their integration.

Frequently asked questions

How does Weave Router decide which model to use?
Weave Router embeds each incoming prompt with a small in-process ONNX model, then scores it against frozen cluster centroids trained from the Avengers-Pro routing research. The closest cluster maps to the best-fitting model on the providers you have enabled, and the routed call is dispatched in under 50ms of added overhead.
Which coding agents and providers does Weave Router support?
Weave Router speaks Anthropic Messages, OpenAI Chat Completions, and Gemini natively, so Claude Code, OpenAI Codex, Cursor, OpenCode, and any compatible app can point at it. Open-weight models including DeepSeek, Kimi, GLM, Qwen, Llama, and Mistral are reached through OpenRouter or any OpenAI-compatible endpoint.
Is Weave Router open source and what does it cost?
Weave Router is published under the Elastic License v2 with the full Go source on GitHub, so self-hosting is free. A managed version at router.workweave.ai offers a free tier and then bills usage-based credits against a Weave account; the team says reference customers including Robinhood, PostHog, and Reducto see 40–70% lower token spend.
How is Weave Router different from other LLM proxies like OpenRouter?
OpenRouter is a single upstream marketplace, while Weave Router sits in front of any provider keys you already have and picks per request instead of per session. The routing uses a learned cluster scorer rather than a hand-written rules table, and the overhead is sub-50ms because embedding and scoring happen locally inside the proxy.

Try it

npx @workweave/router

Sources · 4 outlets

Tags

  • model-router
  • agentic-coding
  • claude-code
  • openai-codex
  • cursor
  • anthropic
  • openai
  • gemini
  • openrouter
  • go
  • open-source
  • proxy
  • cost-optimization

← All releases · Learn AI