Overview
Devstral 2 is Mistral AI's December 2025 coding-agent family, released on December 9, 2025 in two open-weights sizes: Devstral 2, a 123-billion-parameter dense model, and Devstral Small 2, a 24-billion-parameter model that runs on consumer hardware. Both are built for agentic software engineering — exploring codebases, editing across many files, and powering tool-using coding agents — and ship alongside the open-source Mistral Vibe CLI.
On SWE-Bench Verified, Devstral 2 scores 72.2% and Devstral Small 2 scores 68.0%, which Mistral positions among the strongest open-weights results, with Devstral Small 2 landing near models several times its size. Both models use a 256K-token context window and a dense Transformer architecture (the Hugging Face config reports model_type 'ministral3' with rope-scaling, with no expert-routing fields), so every token uses all parameters rather than a Mixture-of-Experts subset.
The two models differ on licensing. Devstral 2 (123B) ships under a Modified MIT license that withholds rights from companies whose global consolidated monthly revenue exceeds $20 million, while Devstral Small 2 (24B) is permissively licensed under Apache 2.0. Devstral Small 2 is fine-tuned from Mistral-Small-3.1-24B-Base-2503 and is distributed in FP8 with support for vLLM, SGLang, Transformers, Ollama, LM Studio, and llama.cpp.
| Released | 2025-12-09 |
|---|---|
| License | Devstral 2: Modified MIT; Devstral Small 2: Apache 2.0 |
| Weights | Open weights |
| Parameters | Devstral 2: 123B; Devstral Small 2: 24B |
| Context | 256K |
| Max output | 256K |
| Architecture | Dense decoder-only Transformer (model_type ministral3, with rope-scaling); not Mixture-of-Experts |
| Knowledge cutoff | 2025 |
| Modalities | Text |
| Status | Generally available |
Benchmarks



Devstral 2 and Devstral Small 2 vs named competitors on SWE-Bench Verified, SWE-Bench Multilingual, and Terminal-Bench 2 (competitor figures are publicly reported values).
| Benchmark | Devstral 2 | Devstral Small 2 | GLM 4.6 | Qwen 3 Coder Plus | MiniMax M2 | Kimi K2 Thinking | DeepSeek v3.2 | GPT 5.1 Codex High | GPT 5.1 Codex Max | Gemini 3 Pro | Claude Sonnet 4.5 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Size (B parameters) | 123 B params | 24 B params | 355 B params | 480 B params | 230 B params | 1000 B params | 671 B params | — | — | — | — |
| SWE-Bench Verified | 72.2% | 68% | 68% | 69.6% | 69.4% | 71.3% | 73.1% | 73.7% | 77.9% | 76.2% | 77.2% |
| SWE-Bench Multilingual | 61.3% | 55.7% | — | 54.7% | 56.5% | 61.1% | 70.2% | — | — | — | 68% |
| Terminal-Bench 2 | 32.6% | 22.5% | 24.6% | 25.4% | 30% | 35.7% | 46.4% | 52.8% | 60.4% | 54.2% | 42.8% |
This model's scores
- SWE-Bench Verified (Devstral 2, 123B)72.2%
- SWE-Bench Multilingual (Devstral 2, 123B)61.3%
- Terminal Bench 2 (Devstral 2, 123B)32.6%
- SWE-Bench Verified (Devstral Small 2, 24B)68%
- SWE-Bench Multilingual (Devstral Small 2, 24B)55.7%
- Terminal Bench 2 (Devstral Small 2, 24B)22.5%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.40 / 1M tokens (Devstral 2); $0.10 / 1M tokens (Devstral Small 2) per 1M tokens |
|---|---|
| Output | $2.00 / 1M tokens (Devstral 2); $0.30 / 1M tokens (Devstral Small 2) per 1M tokens |
Mistral API list prices; a free trial period was offered at launch.
Strengths
- Frontier open-weights coding scores: 72.2% (Devstral 2) and 68.0% (Devstral Small 2) on SWE-Bench Verified
- Devstral Small 2 (24B) runs locally on consumer hardware while staying near far larger models
- Large 256K-token context for whole-repository and multi-file agentic edits
- Open weights on Hugging Face — Apache 2.0 for Small 2, Modified MIT for the 123B model
- Low API pricing and a free trial period via the Mistral API, plus the open-source Mistral Vibe CLI
Best for
- Agentic software engineering: exploring codebases and editing multiple files
- Terminal-native coding agents via the Mistral Vibe CLI or OpenHands-style scaffolds
- Self-hosted, privacy-sensitive coding assistants (Devstral Small 2 on a single workstation)
- Cost-efficient code generation and refactoring through the Mistral API
How to access
| Provider | Model ID |
|---|---|
| Mistral AI ↗ | devstral-2-25-12 |
| Mistral AI ↗ | devstral-small-2-25-12 |
| OpenRouter ↗ | mistralai/devstral-2512 |
Devstral — every version
The full lineage of the Devstral line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| Devstral 2 / Devstral Small 2current | 2025-12-09 | 256K | Devstral 2: Modified MIT; Devstral Small 2: Apache 2.0 |
| Devstral Medium & Small 1.1 (25.07) | 2025-07-10 | 128K | Devstral Small 1.1: Apache 2.0; Devstral Medium: proprietary (API only) |
| Devstral Small (25.05) | 2025-05-21 | 128K | Apache-2.0 |
FAQ
What is the difference between Devstral 2 and Devstral Small 2?
Devstral 2 is the 123-billion-parameter flagship that scores 72.2% on SWE-Bench Verified and ships under a Modified MIT license. Devstral Small 2 is a 24-billion-parameter model scoring 68.0% on SWE-Bench Verified, runs on consumer hardware, and is released under the permissive Apache 2.0 license. Both share a 256K context window and a dense Transformer architecture.
Is Devstral 2 open weights, and what does the Modified MIT license allow?
Yes, both models have open weights on Hugging Face. Devstral Small 2 is Apache 2.0. The 123B Devstral 2 uses a Modified MIT license that withholds rights from any company whose global consolidated monthly revenue exceeds $20 million for the preceding month, who must obtain a separate commercial license from Mistral.
How much does Devstral 2 cost on the Mistral API?
Per Mistral's pricing page, Devstral 2 is $0.40 per million input tokens and $2.00 per million output tokens, while Devstral Small 2 is $0.10 per million input and $0.30 per million output. Mistral offered a free trial period at launch.
Can Devstral Small 2 run locally?
Yes. Devstral Small 2 has 24B parameters, ships in FP8, and is supported by vLLM, SGLang, Transformers, Ollama, LM Studio, and llama.cpp, making it deployable on a single high-memory workstation for private, agentic coding.