MiniMax M2

Name: MiniMax M2
Author: MiniMax

Open-weight 230B/10B-active MoE built for coding and agentic tool use, at roughly 8% of Claude Sonnet's price.

Overview

MiniMax M2 is an open-weight large language model from the Chinese AI lab MiniMax, released on 27 October 2025 as the first model in its MiniMax M-Series line aimed squarely at coding and agentic workflows. It is a Mixture-of-Experts (MoE) model with 230 billion total parameters but only 10 billion activated per token, a design that gives it the broad capability of a large model while keeping latency and serving cost close to that of a small one. MiniMax positions M2 as running at roughly 8% of the price of Claude Sonnet and about twice the speed, at around 100 tokens per second.

M2 is built for end-to-end developer and agent workflows rather than single-turn chat. It is designed to plan and stably execute long chains of tool calls, coordinating a shell, a browser, a Python code interpreter, and various Model Context Protocol (MCP) tools, and it works inside popular coding agents including Claude Code, Cursor, Cline, Kilo Code, and Droid. MiniMax reports it lands in the top five globally on Artificial Analysis's composite intelligence index, with an AA Intelligence score of 61. It is an 'interleaved thinking' model, exposing its reasoning in <think> tags between actions.

MiniMax M2 is text-only. Its weights are published on Hugging Face under a modified MIT license, so it can be downloaded and self-hosted, with day-0 deployment support documented for SGLang, vLLM, MLX-LM, and Transformers. A hosted API is available on the MiniMax Open Platform. M2 is now the first entry in a fast-moving line that has since been followed by M2.1, M2.5, M2.7, and M3 — on the platform it is labelled a legacy model, but it remains a strong, low-cost open-weight option for agentic coding.

Released	2025-10-27
License	MIT (modified)
Weights	Open weights
Parameters	230B total / 10B active (MoE)
Context	204K
Max output	Not publicly disclosed
Architecture	Sparse Mixture-of-Experts (MoE) transformer with 230 billion total parameters and only 10 billion activated per token, which keeps inference latency, cost, and throughput close to a 10B dense model while retaining a large capacity. It is an "interleaved thinking" model that wraps its chain-of-thought in <think>...</think> tags between tool calls, and is tuned for long-chain tool-calling across a shell, a browser, a Python interpreter, and MCP tools. Recommended inference settings are temperature 1.0, top_p 0.95, top_k 40.
Knowledge cutoff	Not publicly disclosed
Modalities	Text
Status	Generally available

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.30 / 1M tokens per 1M tokens
Output	$1.20 / 1M tokens per 1M tokens

Official MiniMax Open Platform pay-as-you-go price for MiniMax-M2 (¥2.1 in / ¥8.4 out per 1M tokens), now listed as a legacy model. MiniMax describes this as roughly 8% of Claude Sonnet's price; M2 was free for a limited time at launch. OpenRouter lists about $0.255 in / $1.00 out.

Pricing source ↗

Strengths

Open weights under a (modified) MIT license — free to download, self-host, and use commercially
Very low serving cost and high throughput: only 10B of 230B parameters activate per token, giving small-model latency at large-model quality
Strong agentic coding: 69.4 on SWE-bench Verified and 77.2 on τ²-Bench for tool use
Purpose-built for long-chain tool calling across shell, browser, Python, and MCP tools
Works out of the box in Claude Code, Cursor, Cline, Kilo Code, and Droid
Priced at roughly 8% of Claude Sonnet (about $0.30 in / $1.20 out per 1M tokens) at around twice the speed
Day-0 deployment support for SGLang, vLLM, MLX-LM, and Transformers

Best for

Autonomous and semi-autonomous coding agents that do multi-file edits and run-fix loops
Software engineering tasks: bug fixing and repo-level changes (SWE-bench-style)
Agentic tool-use and function-calling workflows over shell, browser, Python, and MCP tools
Drop-in low-cost backend for coding assistants like Cursor, Cline, and Claude Code
Self-hosted deployments that need an openly licensed agentic model
Cost-sensitive batched sampling and high-throughput interactive agents

How to access

Provider	Model ID
MiniMax ↗	`MiniMax-M2`
OpenRouter ↗	`minimax/minimax-m2`

MiniMax M-Series — every version

The full lineage of the MiniMax M-Series line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
MiniMax M3current	2026-06-01	1M	MiniMax Community
MiniMax M2.7 / M2.7-highspeed	2026-03-18	—	Open weights
MiniMax M2.5 / M2.5-Lightning	2026-02-12	—	Open weights
MiniMax M2.1	2025-12-23	—	Open weights
MiniMax M2	2025-10-27	—	MIT

FAQ

What is MiniMax M2 and what is it for?

MiniMax M2 is an open-weight large language model released by MiniMax on 27 October 2025, built specifically for coding and agentic tool-use workflows. It is a Mixture-of-Experts model with 230B total parameters and 10B activated per token, and it is designed to plan and execute long chains of tool calls across a shell, browser, Python interpreter, and MCP tools.

Is MiniMax M2 open source and free to use?

The weights are published on Hugging Face under a modified MIT license, so you can download, self-host, and use the model commercially. MiniMax also offers a hosted API on its Open Platform; at launch the API was free for a limited time, and it is now priced at about $0.30 per 1M input tokens and $1.20 per 1M output tokens.

How does MiniMax M2 perform on coding benchmarks?

MiniMax reports 69.4 on SWE-bench Verified, 46.3 on Terminal-Bench, 66.8 on ArtifactsBench, and 77.2 on τ²-Bench for tool use, along with a composite Artificial Analysis Intelligence score of 61 — placing it among the top open-weight models for agentic coding.

What makes MiniMax M2 cheap and fast?

Although it has 230B total parameters, only 10B activate per token thanks to its Mixture-of-Experts design. That keeps inference latency, cost, and throughput close to a 10B dense model, letting MiniMax run it at roughly 8% of Claude Sonnet's price and about twice the speed, at around 100 tokens per second.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// MiniMax M-Series — every version

// FAQ