AI/TLDR

MiniMax-M1 (M1-40k / M1-80k)

The first open-weight, large-scale hybrid-attention reasoning model — 456B-param MoE with a native 1M-token context.

Overview

MiniMax-M1 is, per MiniMax, the world's first open-weight, large-scale hybrid-attention reasoning model. It is built on the MiniMax-Text-01 base and uses a Mixture-of-Experts design with 456 billion total parameters, of which 45.9 billion are activated per token across 32 experts. Its defining trait is a hybrid attention stack: a softmax-attention transformer block follows every seven 'lightning' (linear) attention blocks, which gives near-linear cost as the sequence grows. MiniMax reports that at a 100K-token generation length M1 uses roughly 25% of the FLOPs that DeepSeek-R1 would.

M1 ships in two variants that differ only in their reasoning (thinking) budget: MiniMax-M1-40k and MiniMax-M1-80k. Both natively support a 1-million-token context window — eight times that of DeepSeek-R1 and on par with closed models like Gemini 2.5 Pro. The 80k variant generally scores a little higher on reasoning and coding, while the 40k variant is cheaper to run and actually leads on some long-context and agent benchmarks. Both were trained with reinforcement learning using MiniMax's CISPO algorithm, which clips importance-sampling weights rather than token updates; the full RL run took 512 H800 GPUs about three weeks at a reported rental cost of $534,700.

The weights are released under the permissive Apache 2.0 license on Hugging Face and GitHub, and MiniMax recommends vLLM (0.9.2+) or Transformers for deployment. M1 is text-only. It is positioned for long-context understanding, software-engineering tasks, and agentic tool use, where MiniMax reports it tops other open-weight models and is competitive with leading proprietary systems.

Released2025-06-16
LicenseApache 2.0
WeightsOpen weights
Parameters456B total / 45.9B active (MoE, 32 experts)
Context1M
Max output80k tokens (M1-80k); 40k tokens (M1-40k)
ArchitectureHybrid Mixture-of-Experts (32 experts) with lightning (linear) attention, built on MiniMax-Text-01: one softmax-attention transformer block follows every seven lightning-attention blocks. Trained with large-scale RL using the CISPO algorithm.
Knowledge cutoffJune 2024
ModalitiesText
StatusGenerally available

Benchmarks

  1. AIME 202486%
  2. AIME 2024 (M1-40k)83.3%
  3. AIME 202576.9%
  4. LiveCodeBench65%
  5. SWE-bench Verified56%
  6. SWE-bench Verified (M1-40k)55.6%
  7. MMLU-Pro81.1%
  8. GPQA Diamond70%
  9. TAU-bench (airline, M1-80k)62%
  10. TAU-bench (retail, M1-40k)67.8%
  11. OpenAI-MRCR (1M, M1-40k)58.6%
  12. LongBench-v261.5%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.40 / 1M tokens (0-200k context); $1.30 / 1M tokens (200k-1M context) per 1M tokens
Output$2.20 / 1M tokens per 1M tokens

Tiered input pricing by context length from MiniMax's official announcement; OpenRouter lists $0.40 in / $2.20 out. Free unlimited use is offered on the MiniMax app and web.

Pricing source ↗

Strengths

  • Native 1M-token context — among the largest of any open-weight model, matching closed frontier models
  • Efficient long-form reasoning: lightning attention cuts FLOPs to ~25% of DeepSeek-R1 at 100K-token generation
  • Strong agentic tool use — leads open-weight models on TAU-bench and beats Gemini 2.5 Pro on parts of it
  • Apache 2.0 license with fully open weights — free commercial use and self-hosting
  • Two thinking budgets (40k/80k) let you trade reasoning depth against cost
  • Competitive coding and math (AIME 2024 86.0%, SWE-bench Verified 56.0% on the 80k variant)

Best for

  • Long-document and whole-codebase analysis that needs hundreds of thousands of tokens of context
  • Agentic tool-use and function-calling workflows
  • Software engineering: bug fixing and repo-level tasks (SWE-bench-style)
  • Math and competition-style reasoning
  • Self-hosted reasoning deployments where an open Apache-2.0 license is required

How to access

ProviderModel ID
MiniMax ↗MiniMax-M1
OpenRouter ↗minimax/minimax-m1

FAQ

What is the difference between MiniMax-M1-40k and MiniMax-M1-80k?

They are the same 456B-parameter model trained with two different reasoning (thinking) budgets: 40,000 tokens versus 80,000 tokens. The 80k variant generally scores slightly higher on reasoning and coding benchmarks, while the cheaper 40k variant leads on some long-context and agentic tool-use tasks. Both share the same 1M-token context window.

Is MiniMax-M1 open source and free to use?

The weights are released under the Apache 2.0 license on Hugging Face and GitHub, so you can download, self-host, and use them commercially for free. MiniMax also offers a hosted API (tiered pricing) and free unlimited use through its own app and website.

How large is the context window?

MiniMax-M1 natively supports a 1-million-token context window — about eight times DeepSeek-R1's and comparable to closed frontier models. Its lightning (linear) attention makes processing very long inputs far cheaper than standard softmax attention.

Does MiniMax-M1 support images or audio?

No. MiniMax-M1 is a text-only reasoning model. It is designed for long-context text understanding, coding, math, and agentic tool use rather than multimodal input.