Overview
MiniMax-M1 is, per MiniMax, the world's first open-weight, large-scale hybrid-attention reasoning model. It is built on the MiniMax-Text-01 base and uses a Mixture-of-Experts design with 456 billion total parameters, of which 45.9 billion are activated per token across 32 experts. Its defining trait is a hybrid attention stack: a softmax-attention transformer block follows every seven 'lightning' (linear) attention blocks, which gives near-linear cost as the sequence grows. MiniMax reports that at a 100K-token generation length M1 uses roughly 25% of the FLOPs that DeepSeek-R1 would.
M1 ships in two variants that differ only in their reasoning (thinking) budget: MiniMax-M1-40k and MiniMax-M1-80k. Both natively support a 1-million-token context window — eight times that of DeepSeek-R1 and on par with closed models like Gemini 2.5 Pro. The 80k variant generally scores a little higher on reasoning and coding, while the 40k variant is cheaper to run and actually leads on some long-context and agent benchmarks. Both were trained with reinforcement learning using MiniMax's CISPO algorithm, which clips importance-sampling weights rather than token updates; the full RL run took 512 H800 GPUs about three weeks at a reported rental cost of $534,700.
The weights are released under the permissive Apache 2.0 license on Hugging Face and GitHub, and MiniMax recommends vLLM (0.9.2+) or Transformers for deployment. M1 is text-only. It is positioned for long-context understanding, software-engineering tasks, and agentic tool use, where MiniMax reports it tops other open-weight models and is competitive with leading proprietary systems.
| Released | 2025-06-16 |
|---|---|
| License | Apache 2.0 |
| Weights | Open weights |
| Parameters | 456B total / 45.9B active (MoE, 32 experts) |
| Context | 1M |
| Max output | 80k tokens (M1-80k); 40k tokens (M1-40k) |
| Architecture | Hybrid Mixture-of-Experts (32 experts) with lightning (linear) attention, built on MiniMax-Text-01: one softmax-attention transformer block follows every seven lightning-attention blocks. Trained with large-scale RL using the CISPO algorithm. |
| Knowledge cutoff | June 2024 |
| Modalities | Text |
| Status | Generally available |
Benchmarks
- AIME 202486%
- AIME 2024 (M1-40k)83.3%
- AIME 202576.9%
- LiveCodeBench65%
- SWE-bench Verified56%
- SWE-bench Verified (M1-40k)55.6%
- MMLU-Pro81.1%
- GPQA Diamond70%
- TAU-bench (airline, M1-80k)62%
- TAU-bench (retail, M1-40k)67.8%
- OpenAI-MRCR (1M, M1-40k)58.6%
- LongBench-v261.5%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.40 / 1M tokens (0-200k context); $1.30 / 1M tokens (200k-1M context) per 1M tokens |
|---|---|
| Output | $2.20 / 1M tokens per 1M tokens |
Tiered input pricing by context length from MiniMax's official announcement; OpenRouter lists $0.40 in / $2.20 out. Free unlimited use is offered on the MiniMax app and web.
Strengths
- Native 1M-token context — among the largest of any open-weight model, matching closed frontier models
- Efficient long-form reasoning: lightning attention cuts FLOPs to ~25% of DeepSeek-R1 at 100K-token generation
- Strong agentic tool use — leads open-weight models on TAU-bench and beats Gemini 2.5 Pro on parts of it
- Apache 2.0 license with fully open weights — free commercial use and self-hosting
- Two thinking budgets (40k/80k) let you trade reasoning depth against cost
- Competitive coding and math (AIME 2024 86.0%, SWE-bench Verified 56.0% on the 80k variant)
Best for
- Long-document and whole-codebase analysis that needs hundreds of thousands of tokens of context
- Agentic tool-use and function-calling workflows
- Software engineering: bug fixing and repo-level tasks (SWE-bench-style)
- Math and competition-style reasoning
- Self-hosted reasoning deployments where an open Apache-2.0 license is required
How to access
| Provider | Model ID |
|---|---|
| MiniMax ↗ | MiniMax-M1 |
| OpenRouter ↗ | minimax/minimax-m1 |
FAQ
What is the difference between MiniMax-M1-40k and MiniMax-M1-80k?
They are the same 456B-parameter model trained with two different reasoning (thinking) budgets: 40,000 tokens versus 80,000 tokens. The 80k variant generally scores slightly higher on reasoning and coding benchmarks, while the cheaper 40k variant leads on some long-context and agentic tool-use tasks. Both share the same 1M-token context window.
Is MiniMax-M1 open source and free to use?
The weights are released under the Apache 2.0 license on Hugging Face and GitHub, so you can download, self-host, and use them commercially for free. MiniMax also offers a hosted API (tiered pricing) and free unlimited use through its own app and website.
How large is the context window?
MiniMax-M1 natively supports a 1-million-token context window — about eight times DeepSeek-R1's and comparable to closed frontier models. Its lightning (linear) attention makes processing very long inputs far cheaper than standard softmax attention.
Does MiniMax-M1 support images or audio?
No. MiniMax-M1 is a text-only reasoning model. It is designed for long-context text understanding, coding, math, and agentic tool use rather than multimodal input.