AI/TLDR

Xiaomi · 2026-04-22 · notable

Xiaomi MiMo-V2.5 Series: Multimodal Agent Model with 42% Token Efficiency Edge

Xiaomi MiMo-V2.5-Pro is a 1T-param MoE model (42B active) scoring 57.2% on SWE-bench Pro — within 0.5 pts of GPT-5.4 — using 42% fewer tokens than competitors. Native vision, audio, and video. MiMo-V2.5-TTS and ASR add a full voice pipeline. $1/$3 per MTok.

XiaomiMiMo/MiMo GitHub repository — Xiaomi's open-source reasoning LLM project

Xiaomi's MiMo-V2.5-Pro matches GPT-5.4 on SWE-bench Pro while using 42% fewer tokens, with native vision, audio, and video in a single 1T MoE model.

Key specs

Active params42B
Context window1M tokens
Swe bench pro57.2%
Total parameters1T
Input cost (pro)$1.00/MTok
Output cost (pro)$3.00/MTok
Token efficiency vs kimi k2.642% fewer tokens

What is it?

Xiaomi's AI division released the MiMo-V2.5 series on April 22-24, 2026 — a family of multimodal AI models replacing the text-only MiMo-V2-Pro. The flagship MiMo-V2.5-Pro is a 1-trillion-parameter mixture-of-experts model (42B active parameters per forward pass) that processes text, images, audio, and video in a single model. Alongside the Pro, Xiaomi also released MiMo-V2.5 (base, $0.40/$2.00 per MTok), MiMo-V2.5-TTS (text-to-speech with voice cloning and emotion tags), and MiMo-V2.5-ASR (bilingual speech recognition supporting Chinese dialects). Open-source release is forthcoming.

How does it work?

MiMo-V2.5-Pro is optimized for sustained long-horizon agentic tasks. In published demos it completed a full SysY compiler in Rust (233/233 tests passing) in 4.3 hours using 672 tool calls, and generated an 8,192-line video editing desktop application across 1,868 tool calls. Its core efficiency gain: 42% fewer tokens than Kimi K2.6 at equivalent SWE-bench Pro scores. At $1.00/$3.00 per MTok input/output, with 60-80 tokens/second throughput and a 1M-token context window, it is available on the Xiaomi MiMo API and through OpenRouter.

Why does it matter?

Token efficiency compounds across long agentic tasks. A 42% reduction on multi-hour, multi-thousand-tool-call runs translates directly to lower cost per completed task — not just per token. Xiaomi already holds 21.1% of OpenRouter model traffic (roughly 3× OpenAI's 7.5% share), so this is a high-availability model, not a waitlist launch. The TTS/ASR additions turn MiMo-V2.5 into a full audio-to-text-to-audio agent stack, relevant for voice-first applications.

Who is it for?

Developers running long agentic coding pipelines who want frontier-class benchmarks at lower per-task cost

Try it

API: mimo-v2.5-pro, available on OpenRouter (xiaomi/mimo-v2.5-pro). $1.00/$3.00 per MTok

Sources · 4 outlets

Tags

  • xiaomi
  • mimo
  • moe
  • multimodal
  • agentic
  • token-efficiency
  • voice
  • tts
  • asr
  • coding
  • open-source-forthcoming

← All releases · Learn AI