Overview
MiniMax M2 is an open-weight large language model from the Chinese AI lab MiniMax, released on 27 October 2025 as the first model in its MiniMax M-Series line aimed squarely at coding and agentic workflows. It is a Mixture-of-Experts (MoE) model with 230 billion total parameters but only 10 billion activated per token, a design that gives it the broad capability of a large model while keeping latency and serving cost close to that of a small one. MiniMax positions M2 as running at roughly 8% of the price of Claude Sonnet and about twice the speed, at around 100 tokens per second.
M2 is built for end-to-end developer and agent workflows rather than single-turn chat. It is designed to plan and stably execute long chains of tool calls, coordinating a shell, a browser, a Python code interpreter, and various Model Context Protocol (MCP) tools, and it works inside popular coding agents including Claude Code, Cursor, Cline, Kilo Code, and Droid. MiniMax reports it lands in the top five globally on Artificial Analysis's composite intelligence index, with an AA Intelligence score of 61. It is an 'interleaved thinking' model, exposing its reasoning in <think> tags between actions.
MiniMax M2 is text-only. Its weights are published on Hugging Face under a modified MIT license, so it can be downloaded and self-hosted, with day-0 deployment support documented for SGLang, vLLM, MLX-LM, and Transformers. A hosted API is available on the MiniMax Open Platform. M2 is now the first entry in a fast-moving line that has since been followed by M2.1, M2.5, M2.7, and M3 — on the platform it is labelled a legacy model, but it remains a strong, low-cost open-weight option for agentic coding.
| Released | 2025-10-27 |
|---|---|
| License | MIT (modified) |
| Weights | Open weights |
| Parameters | 230B total / 10B active (MoE) |
| Context | 204K |
| Max output | Not publicly disclosed |
| Architecture | Sparse Mixture-of-Experts (MoE) transformer with 230 billion total parameters and only 10 billion activated per token, which keeps inference latency, cost, and throughput close to a 10B dense model while retaining a large capacity. It is an "interleaved thinking" model that wraps its chain-of-thought in <think>...</think> tags between tool calls, and is tuned for long-chain tool-calling across a shell, a browser, a Python interpreter, and MCP tools. Recommended inference settings are temperature 1.0, top_p 0.95, top_k 40. |
| Knowledge cutoff | Not publicly disclosed |
| Modalities | Text |
| Status | Generally available |
Benchmarks
- SWE-bench Verified69.4%
- Multi-SWE-Bench36.2%
- Terminal-Bench46.3%
- ArtifactsBench66.8%
- τ²-Bench77.2%
- BrowseComp44%
- GAIA (text only)75.7%
- AA Intelligence (composite)61index
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.30 / 1M tokens per 1M tokens |
|---|---|
| Output | $1.20 / 1M tokens per 1M tokens |
Official MiniMax Open Platform pay-as-you-go price for MiniMax-M2 (¥2.1 in / ¥8.4 out per 1M tokens), now listed as a legacy model. MiniMax describes this as roughly 8% of Claude Sonnet's price; M2 was free for a limited time at launch. OpenRouter lists about $0.255 in / $1.00 out.
Strengths
- Open weights under a (modified) MIT license — free to download, self-host, and use commercially
- Very low serving cost and high throughput: only 10B of 230B parameters activate per token, giving small-model latency at large-model quality
- Strong agentic coding: 69.4 on SWE-bench Verified and 77.2 on τ²-Bench for tool use
- Purpose-built for long-chain tool calling across shell, browser, Python, and MCP tools
- Works out of the box in Claude Code, Cursor, Cline, Kilo Code, and Droid
- Priced at roughly 8% of Claude Sonnet (about $0.30 in / $1.20 out per 1M tokens) at around twice the speed
- Day-0 deployment support for SGLang, vLLM, MLX-LM, and Transformers
Best for
- Autonomous and semi-autonomous coding agents that do multi-file edits and run-fix loops
- Software engineering tasks: bug fixing and repo-level changes (SWE-bench-style)
- Agentic tool-use and function-calling workflows over shell, browser, Python, and MCP tools
- Drop-in low-cost backend for coding assistants like Cursor, Cline, and Claude Code
- Self-hosted deployments that need an openly licensed agentic model
- Cost-sensitive batched sampling and high-throughput interactive agents
How to access
| Provider | Model ID |
|---|---|
| MiniMax ↗ | MiniMax-M2 |
| OpenRouter ↗ | minimax/minimax-m2 |
MiniMax M-Series — every version
The full lineage of the MiniMax M-Series line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| MiniMax M3current | 2026-06-01 | 1M | MiniMax Community |
| MiniMax M2.7 / M2.7-highspeed | 2026-03-18 | — | Open weights |
| MiniMax M2.5 / M2.5-Lightning | 2026-02-12 | — | Open weights |
| MiniMax M2.1 | 2025-12-23 | — | Open weights |
| MiniMax M2 | 2025-10-27 | — | MIT |
FAQ
What is MiniMax M2 and what is it for?
MiniMax M2 is an open-weight large language model released by MiniMax on 27 October 2025, built specifically for coding and agentic tool-use workflows. It is a Mixture-of-Experts model with 230B total parameters and 10B activated per token, and it is designed to plan and execute long chains of tool calls across a shell, browser, Python interpreter, and MCP tools.
Is MiniMax M2 open source and free to use?
The weights are published on Hugging Face under a modified MIT license, so you can download, self-host, and use the model commercially. MiniMax also offers a hosted API on its Open Platform; at launch the API was free for a limited time, and it is now priced at about $0.30 per 1M input tokens and $1.20 per 1M output tokens.
How does MiniMax M2 perform on coding benchmarks?
MiniMax reports 69.4 on SWE-bench Verified, 46.3 on Terminal-Bench, 66.8 on ArtifactsBench, and 77.2 on τ²-Bench for tool use, along with a composite Artificial Analysis Intelligence score of 61 — placing it among the top open-weight models for agentic coding.
What makes MiniMax M2 cheap and fast?
Although it has 230B total parameters, only 10B activate per token thanks to its Mixture-of-Experts design. That keeps inference latency, cost, and throughput close to a 10B dense model, letting MiniMax run it at roughly 8% of Claude Sonnet's price and about twice the speed, at around 100 tokens per second.