Xiaomi · 2026-04-22 · notable
Xiaomi MiMo-V2.5 Series: Multimodal Agent Model with 42% Token Efficiency Edge
Xiaomi MiMo-V2.5-Pro is a 1T-param MoE model (42B active) scoring 57.2% on SWE-bench Pro — within 0.5 pts of GPT-5.4 — using 42% fewer tokens than competitors. Native vision, audio, and video. MiMo-V2.5-TTS and ASR add a full voice pipeline. $1/$3 per MTok.
Xiaomi's MiMo-V2.5-Pro matches GPT-5.4 on SWE-bench Pro while using 42% fewer tokens, with native vision, audio, and video in a single 1T MoE model.
Key specs
| Active params | 42B |
|---|---|
| Context window | 1M tokens |
| Swe bench pro | 57.2% |
| Total parameters | 1T |
| Input cost (pro) | $1.00/MTok |
| Output cost (pro) | $3.00/MTok |
| Token efficiency vs kimi k2.6 | 42% fewer tokens |
What is it?
Xiaomi's AI division released the MiMo-V2.5 series on April 22-24, 2026 — a family of multimodal AI models replacing the text-only MiMo-V2-Pro. The flagship MiMo-V2.5-Pro is a 1-trillion-parameter mixture-of-experts model (42B active parameters per forward pass) that processes text, images, audio, and video in a single model. Alongside the Pro, Xiaomi also released MiMo-V2.5 (base, $0.40/$2.00 per MTok), MiMo-V2.5-TTS (text-to-speech with voice cloning and emotion tags), and MiMo-V2.5-ASR (bilingual speech recognition supporting Chinese dialects). Open-source release is forthcoming.
How does it work?
MiMo-V2.5-Pro is optimized for sustained long-horizon agentic tasks. In published demos it completed a full SysY compiler in Rust (233/233 tests passing) in 4.3 hours using 672 tool calls, and generated an 8,192-line video editing desktop application across 1,868 tool calls. Its core efficiency gain: 42% fewer tokens than Kimi K2.6 at equivalent SWE-bench Pro scores. At $1.00/$3.00 per MTok input/output, with 60-80 tokens/second throughput and a 1M-token context window, it is available on the Xiaomi MiMo API and through OpenRouter.
Why does it matter?
Token efficiency compounds across long agentic tasks. A 42% reduction on multi-hour, multi-thousand-tool-call runs translates directly to lower cost per completed task — not just per token. Xiaomi already holds 21.1% of OpenRouter model traffic (roughly 3× OpenAI's 7.5% share), so this is a high-availability model, not a waitlist launch. The TTS/ASR additions turn MiMo-V2.5 into a full audio-to-text-to-audio agent stack, relevant for voice-first applications.
Who is it for?
Developers running long agentic coding pipelines who want frontier-class benchmarks at lower per-task cost
Try it
API: mimo-v2.5-pro, available on OpenRouter (xiaomi/mimo-v2.5-pro). $1.00/$3.00 per MTok