MiniMax M2.7

Name: MiniMax M2.7
Author: MiniMax

A self-evolving open-weight agent model with Opus-level coding at a fraction of the cost.

Overview

MiniMax M2.7 is an open-weight large language model from MiniMax, released on March 18, 2026, as the next step in the company's M-Series line. It uses a Mixture-of-Experts design with roughly 230 billion total parameters but only about 10 billion active per token, which keeps inference cheap while still reaching the top tier on coding and agentic tasks. MiniMax frames M2.7 as its first model to actively participate in its own evolution, building and refining agent harnesses rather than just answering prompts.

The model is built for production agentic work: live debugging, root-cause analysis, multi-step tool use, and end-to-end document generation across formats. It has a 205K-token context window and can produce up to 131K tokens in a single response, which suits long codebases and multi-file refactors. M2.7 is text-only — it does not accept image, audio, or video input (MiniMax's separate M3 line covers multimodal use cases).

M2.7 ships in two variants that return the same results: the standard MiniMax-M2.7 and the M2.7-highspeed build, which is tuned for latency-sensitive workloads and runs at around 100 tokens per second for a higher token price. Weights for the base model are published on Hugging Face under a non-commercial license (commercial use requires written authorization from MiniMax), so teams can self-host or use the hosted API on MiniMax's platform and third-party gateways.

Released	2026-03-18
License	MiniMax Non-Commercial License (open weights; commercial use needs separate authorization)
Weights	Open weights
Parameters	230B total / 10B active (MoE)
Context	205K
Max output	131K
Architecture	Mixture-of-Experts (MoE) transformer with roughly 230B total parameters and about 10B active per token. The M2.7 series ships in two variants that return identical outputs: the standard MiniMax-M2.7 and the latency-optimized M2.7-highspeed, which trades a higher price for output speeds near 100 tokens/second.
Knowledge cutoff	Not publicly disclosed
Modalities	Text
Status	Available

Benchmarks

SWE-Pro56.22%
VIBE-Pro55.6%
Terminal Bench 257%
Toolathon46.3%
SWE Multilingual76.5%
Multi-SWE-Bench52.7%
NL2Repo39.8%
MM Claw62.7%
Artificial Analysis Intelligence Index38index

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.30 / 1M tokens per 1M tokens
Cached input	$0.06 / 1M tokens (cache read; cache write $0.375 / 1M) per 1M tokens
Output	$1.20 / 1M tokens per 1M tokens

Official MiniMax pay-as-you-go pricing for standard MiniMax-M2.7. The M2.7-highspeed variant costs $0.60 / 1M input and $2.40 / 1M output, with the same cache rates.

Pricing source ↗

Strengths

Open weights published on Hugging Face, so the base model can be self-hosted or quantized for local agents
Sparse MoE design (230B total, ~10B active) delivers strong coding/agent scores at very low per-token cost
Frontier-level results on agentic coding benchmarks like SWE-Pro (56.22%) and Terminal Bench 2 (57.0%)
Large 205K-token context with up to 131K-token output for long codebases and full document generation
Two interchangeable variants — standard for cost, highspeed (~100 tok/s) for latency-sensitive apps
Aggressive prompt-cache pricing ($0.06 / 1M cached read) makes repeated agent loops cheap

Best for

Autonomous software-engineering agents: debugging, refactoring, and multi-file code changes
Building and running agent harnesses with multi-step tool calling
Generating long-form business documents (Word/Excel/PowerPoint-style output)
Self-hosted, cost-sensitive coding assistants where open weights matter
Long-context tasks like reasoning over large repositories or document sets
Latency-sensitive interactive agents using the M2.7-highspeed variant

How to access

Provider	Model ID
MiniMax ↗	`MiniMax-M2.7`
OpenRouter ↗	`minimax/minimax-m2.7`
NVIDIA NIM ↗	`minimaxai/minimax-m2.7`

MiniMax M-Series — every version

The full lineage of the MiniMax M-Series line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
MiniMax M3current	2026-06-01	1M	MiniMax Community
MiniMax M2.7 / M2.7-highspeed	2026-03-18	—	Open weights
MiniMax M2.5 / M2.5-Lightning	2026-02-12	—	Open weights
MiniMax M2.1	2025-12-23	—	Open weights
MiniMax M2	2025-10-27	—	MIT

FAQ

Is MiniMax M2.7 open source?

The weights are open and published on Hugging Face, but the license is non-commercial: you can download and run M2.7 for research and personal use, while commercial use requires separate written authorization from MiniMax. That makes it open-weight rather than a permissive open-source release.

What is the difference between MiniMax M2.7 and M2.7-highspeed?

Both variants return identical outputs and the same intelligence level. M2.7-highspeed is tuned for latency-sensitive applications, running at roughly 100 tokens per second, but it costs more — $0.60 / 1M input and $2.40 / 1M output versus $0.30 / $1.20 for the standard model.

Does MiniMax M2.7 support images or other media?

No. M2.7 is a text-only model — it accepts and produces text but cannot process image, audio, or video input. MiniMax's separate M3 line is the one positioned for multimodal use cases.

How much does MiniMax M2.7 cost to use?

Official MiniMax pay-as-you-go pricing for standard M2.7 is $0.30 per million input tokens and $1.20 per million output tokens, with cached input reads at just $0.06 per million tokens. The model has a 205K-token context window and can output up to 131K tokens per response.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// MiniMax M-Series — every version

// FAQ