Overview
MiniMax M2.5 is an open-weight large language model from Shanghai-based MiniMax, released on February 12, 2026. It uses a sparse Mixture-of-Experts design with roughly 229B total parameters but only about 10B active per token, which keeps inference cheap while preserving frontier-level quality on coding, tool use, web search, and office-document tasks. The model is text-only; MiniMax handles vision and audio through separate models.
The release shipped in two variants that share the same weights and capability but differ in speed: the standard MiniMax M2.5 and a faster M2.5-Lightning. MiniMax positioned both as built for real-world productivity, trained with reinforcement learning across many complex digital working environments rather than tuned purely for static benchmarks. It posts 80.2% on SWE-bench Verified and 76.3% on BrowseComp, and lands around 42 on the Artificial Analysis Intelligence Index — competitive with much larger closed models.
The headline pitch is price. At launch MiniMax priced M2.5 at roughly one-tenth to one-twentieth the cost of comparable closed frontier models such as Claude Opus 4.6, making sustained agentic use economically viable. The weights are published on Hugging Face under a modified MIT license that requires commercial users to display the 'MiniMax M2.5' name in their interface. It has since been succeeded in MiniMax's lineup by later M-series models (M2.7), and on the official API M2.5 is now listed among legacy models.
| Released | 2026-02-12 |
|---|---|
| License | Modified MIT (attribution required for commercial use) |
| Weights | Open weights |
| Parameters | 229B total / 10B active (MoE) |
| Context | 205K |
| Max output | ~131K tokens |
| Architecture | Sparse Mixture-of-Experts (MoE) transformer with about 229B total parameters and roughly 10B activated per token. Trained with large-scale reinforcement learning across many real-world digital working environments, with a reasoning ("thinking") mode for agentic and coding tasks. Released in two interchangeable variants — a standard model and a faster M2.5-Lightning — that share weights and capability but differ in serving speed. |
| Knowledge cutoff | Not disclosed |
| Modalities | Text |
| Status | Superseded |
Benchmarks
- SWE-bench Verified80.2%
- Multi-SWE-Bench51.3%
- BrowseComp76.3%
- GPQA Diamond85.2%
- AIME 202586.3%
- MMLU-Pro80.1%
- Artificial Analysis Intelligence Index42%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.15 / 1M tokens (M2.5); $0.30 / 1M tokens (M2.5-Lightning) per 1M tokens |
|---|---|
| Output | $1.20 / 1M tokens (M2.5); $2.40 / 1M tokens (M2.5-Lightning) per 1M tokens |
Launch pricing from MiniMax. Both variants support prompt caching. M2.5 is now listed among legacy models on the MiniMax API; third-party providers (OpenRouter, SiliconFlow, Fireworks, Novita) may price it differently.
Strengths
- Strong agentic coding: 80.2% on SWE-bench Verified, competitive with closed frontier models
- Very low cost per token relative to closed rivals, enabling long-running agent workloads
- Open weights under a permissive (modified MIT) license, downloadable and self-hostable
- Efficient MoE design — about 10B active parameters keeps serving cheap and fast
- Two serving tiers: standard M2.5 and a higher-throughput M2.5-Lightning variant
- Long 205K-token context window for large codebases and multi-step agent traces
- Built for real-world office and tool-use tasks, not just static benchmarks
Best for
- Autonomous and semi-autonomous coding agents running across large codebases
- Cost-sensitive, long-horizon agent workflows where token volume is high
- Web-research and browsing agents (strong BrowseComp performance)
- Self-hosted or private deployment where open weights are a requirement
- Office and document-generation tasks (Word, Excel, PowerPoint workflows)
- Multilingual programming across Python, Go, C++, Rust, and TypeScript
How to access
| Provider | Model ID |
|---|---|
| MiniMax ↗ | MiniMax-M2.5 |
| OpenRouter ↗ | minimax/minimax-m2.5 |
| Hugging Face ↗ | MiniMaxAI/MiniMax-M2.5 |
MiniMax M-Series — every version
The full lineage of the MiniMax M-Series line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| MiniMax M3current | 2026-06-01 | 1M | MiniMax Community |
| MiniMax M2.7 / M2.7-highspeed | 2026-03-18 | — | Open weights |
| MiniMax M2.5 / M2.5-Lightning | 2026-02-12 | — | Open weights |
| MiniMax M2.1 | 2025-12-23 | — | Open weights |
| MiniMax M2 | 2025-10-27 | — | MIT |
FAQ
What is the difference between MiniMax M2.5 and M2.5-Lightning?
They share the same weights and capabilities and differ only in serving speed and price. The standard M2.5 runs around 50 tokens per second at $0.15/$1.20 per million input/output tokens, while M2.5-Lightning roughly doubles throughput to about 100 tokens per second at $0.30/$2.40 per million tokens.
Is MiniMax M2.5 open source?
The weights are open and published on Hugging Face under a modified MIT license. It is open-weight: free to download and self-host, but the license requires that commercial users prominently display the 'MiniMax M2.5' name in their user interface.
How many parameters does MiniMax M2.5 have?
It is a Mixture-of-Experts model with about 229B total parameters but only roughly 10B activated per token, which is what makes its inference unusually cheap for a model of its quality.
Is MiniMax M2.5 still current?
No. M2.5 launched in February 2026 and has since been succeeded by later M-series releases (such as M2.7). On MiniMax's official API it is now listed among legacy models, though it remains downloadable from Hugging Face and available through third-party providers.