Overview
MiniMax M2.7 is an open-weight large language model from MiniMax, released on March 18, 2026, as the next step in the company's M-Series line. It uses a Mixture-of-Experts design with roughly 230 billion total parameters but only about 10 billion active per token, which keeps inference cheap while still reaching the top tier on coding and agentic tasks. MiniMax frames M2.7 as its first model to actively participate in its own evolution, building and refining agent harnesses rather than just answering prompts.
The model is built for production agentic work: live debugging, root-cause analysis, multi-step tool use, and end-to-end document generation across formats. It has a 205K-token context window and can produce up to 131K tokens in a single response, which suits long codebases and multi-file refactors. M2.7 is text-only — it does not accept image, audio, or video input (MiniMax's separate M3 line covers multimodal use cases).
M2.7 ships in two variants that return the same results: the standard MiniMax-M2.7 and the M2.7-highspeed build, which is tuned for latency-sensitive workloads and runs at around 100 tokens per second for a higher token price. Weights for the base model are published on Hugging Face under a non-commercial license (commercial use requires written authorization from MiniMax), so teams can self-host or use the hosted API on MiniMax's platform and third-party gateways.
| Released | 2026-03-18 |
|---|---|
| License | MiniMax Non-Commercial License (open weights; commercial use needs separate authorization) |
| Weights | Open weights |
| Parameters | 230B total / 10B active (MoE) |
| Context | 205K |
| Max output | 131K |
| Architecture | Mixture-of-Experts (MoE) transformer with roughly 230B total parameters and about 10B active per token. The M2.7 series ships in two variants that return identical outputs: the standard MiniMax-M2.7 and the latency-optimized M2.7-highspeed, which trades a higher price for output speeds near 100 tokens/second. |
| Knowledge cutoff | Not publicly disclosed |
| Modalities | Text |
| Status | Available |
Benchmarks
- SWE-Pro56.22%
- VIBE-Pro55.6%
- Terminal Bench 257%
- Toolathon46.3%
- SWE Multilingual76.5%
- Multi-SWE-Bench52.7%
- NL2Repo39.8%
- MM Claw62.7%
- Artificial Analysis Intelligence Index38index
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.30 / 1M tokens per 1M tokens |
|---|---|
| Cached input | $0.06 / 1M tokens (cache read; cache write $0.375 / 1M) per 1M tokens |
| Output | $1.20 / 1M tokens per 1M tokens |
Official MiniMax pay-as-you-go pricing for standard MiniMax-M2.7. The M2.7-highspeed variant costs $0.60 / 1M input and $2.40 / 1M output, with the same cache rates.
Strengths
- Open weights published on Hugging Face, so the base model can be self-hosted or quantized for local agents
- Sparse MoE design (230B total, ~10B active) delivers strong coding/agent scores at very low per-token cost
- Frontier-level results on agentic coding benchmarks like SWE-Pro (56.22%) and Terminal Bench 2 (57.0%)
- Large 205K-token context with up to 131K-token output for long codebases and full document generation
- Two interchangeable variants — standard for cost, highspeed (~100 tok/s) for latency-sensitive apps
- Aggressive prompt-cache pricing ($0.06 / 1M cached read) makes repeated agent loops cheap
Best for
- Autonomous software-engineering agents: debugging, refactoring, and multi-file code changes
- Building and running agent harnesses with multi-step tool calling
- Generating long-form business documents (Word/Excel/PowerPoint-style output)
- Self-hosted, cost-sensitive coding assistants where open weights matter
- Long-context tasks like reasoning over large repositories or document sets
- Latency-sensitive interactive agents using the M2.7-highspeed variant
How to access
| Provider | Model ID |
|---|---|
| MiniMax ↗ | MiniMax-M2.7 |
| OpenRouter ↗ | minimax/minimax-m2.7 |
| NVIDIA NIM ↗ | minimaxai/minimax-m2.7 |
MiniMax M-Series — every version
The full lineage of the MiniMax M-Series line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| MiniMax M3current | 2026-06-01 | 1M | MiniMax Community |
| MiniMax M2.7 / M2.7-highspeed | 2026-03-18 | — | Open weights |
| MiniMax M2.5 / M2.5-Lightning | 2026-02-12 | — | Open weights |
| MiniMax M2.1 | 2025-12-23 | — | Open weights |
| MiniMax M2 | 2025-10-27 | — | MIT |
FAQ
Is MiniMax M2.7 open source?
The weights are open and published on Hugging Face, but the license is non-commercial: you can download and run M2.7 for research and personal use, while commercial use requires separate written authorization from MiniMax. That makes it open-weight rather than a permissive open-source release.
What is the difference between MiniMax M2.7 and M2.7-highspeed?
Both variants return identical outputs and the same intelligence level. M2.7-highspeed is tuned for latency-sensitive applications, running at roughly 100 tokens per second, but it costs more — $0.60 / 1M input and $2.40 / 1M output versus $0.30 / $1.20 for the standard model.
Does MiniMax M2.7 support images or other media?
No. M2.7 is a text-only model — it accepts and produces text but cannot process image, audio, or video input. MiniMax's separate M3 line is the one positioned for multimodal use cases.
How much does MiniMax M2.7 cost to use?
Official MiniMax pay-as-you-go pricing for standard M2.7 is $0.30 per million input tokens and $1.20 per million output tokens, with cached input reads at just $0.06 per million tokens. The model has a 205K-token context window and can output up to 131K tokens per response.