AI/TLDR

Devstral 2 / Devstral Small 2

Mistral's frontier open-weights coding-agent family — a 123B dense Devstral 2 and a laptop-friendly 24B Devstral Small 2.

Overview

Devstral 2 is Mistral AI's December 2025 coding-agent family, released on December 9, 2025 in two open-weights sizes: Devstral 2, a 123-billion-parameter dense model, and Devstral Small 2, a 24-billion-parameter model that runs on consumer hardware. Both are built for agentic software engineering — exploring codebases, editing across many files, and powering tool-using coding agents — and ship alongside the open-source Mistral Vibe CLI.

On SWE-Bench Verified, Devstral 2 scores 72.2% and Devstral Small 2 scores 68.0%, which Mistral positions among the strongest open-weights results, with Devstral Small 2 landing near models several times its size. Both models use a 256K-token context window and a dense Transformer architecture (the Hugging Face config reports model_type 'ministral3' with rope-scaling, with no expert-routing fields), so every token uses all parameters rather than a Mixture-of-Experts subset.

The two models differ on licensing. Devstral 2 (123B) ships under a Modified MIT license that withholds rights from companies whose global consolidated monthly revenue exceeds $20 million, while Devstral Small 2 (24B) is permissively licensed under Apache 2.0. Devstral Small 2 is fine-tuned from Mistral-Small-3.1-24B-Base-2503 and is distributed in FP8 with support for vLLM, SGLang, Transformers, Ollama, LM Studio, and llama.cpp.

Released2025-12-09
LicenseDevstral 2: Modified MIT; Devstral Small 2: Apache 2.0
WeightsOpen weights
ParametersDevstral 2: 123B; Devstral Small 2: 24B
Context256K
Max output256K
ArchitectureDense decoder-only Transformer (model_type ministral3, with rope-scaling); not Mixture-of-Experts
Knowledge cutoff2025
ModalitiesText
StatusGenerally available

Benchmarks

Bar chart titled 'SWE-Bench Verified: Open-weight vs Proprietary models' comparing Devstral Small 2 (68.0) and Devstral 2 (72.2) against DeepSWE (42.2), CWM (53.9), GPT-OSS-120B (62.4), GLM 4.6 (68.0), Minimax M2 (69.4), Qwen 3 coder plus (69.6), Kimi K2 thinking (71.3), Deepseek V3.2 (73.1), Grok Code Fast 1 (70.8), Gemini 3 Pro (76.2), GPT 5.1 Codex Max (77.9), and Claude 4.5 Sonnet (77.2) on SWE-Bench Verified.
SWE-Bench Verified scores: Devstral 2 / Devstral Small 2 vs open-weight and proprietary models. — Mistral AI
Scatter plot of SWE-Bench Verified Regular Performance (%) versus Model Size (B parameters), showing Devstral 2 and Devstral Small 2 in the Pareto-efficient region compared with MiniMax M2, GLM 4.6, Qwen3 coder plus, Qwen 3 coder flash, CWM, DeepSeek v3.2, and Kimi K2 thinking.
SWE-Bench Verified performance plotted against model size (efficiency frontier). — Mistral AI
Stacked Win/Tie/Lose bar chart of human evaluations: Devstral 2 vs DeepSeek V3.2 (42.8% win, 28.6% tie, 28.6% lose) and Devstral 2 vs Sonnet 4.5 (21.4% win, 25.5% tie, 53.1% lose). Evaluations judged by humans conducted by a third party (Surge).
Human-evaluation win/tie/lose rates for Devstral 2 vs DeepSeek V3.2 and Claude Sonnet 4.5. — Mistral AI

Devstral 2 and Devstral Small 2 vs named competitors on SWE-Bench Verified, SWE-Bench Multilingual, and Terminal-Bench 2 (competitor figures are publicly reported values).

BenchmarkDevstral 2Devstral Small 2GLM 4.6Qwen 3 Coder PlusMiniMax M2Kimi K2 ThinkingDeepSeek v3.2GPT 5.1 Codex HighGPT 5.1 Codex MaxGemini 3 ProClaude Sonnet 4.5
Size (B parameters)123 B params24 B params355 B params480 B params230 B params1000 B params671 B params
SWE-Bench Verified72.2%68%68%69.6%69.4%71.3%73.1%73.7%77.9%76.2%77.2%
SWE-Bench Multilingual61.3%55.7%54.7%56.5%61.1%70.2%68%
Terminal-Bench 232.6%22.5%24.6%25.4%30%35.7%46.4%52.8%60.4%54.2%42.8%

Comparison source ↗

This model's scores

  1. SWE-Bench Verified (Devstral 2, 123B)72.2%
  2. SWE-Bench Multilingual (Devstral 2, 123B)61.3%
  3. Terminal Bench 2 (Devstral 2, 123B)32.6%
  4. SWE-Bench Verified (Devstral Small 2, 24B)68%
  5. SWE-Bench Multilingual (Devstral Small 2, 24B)55.7%
  6. Terminal Bench 2 (Devstral Small 2, 24B)22.5%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.40 / 1M tokens (Devstral 2); $0.10 / 1M tokens (Devstral Small 2) per 1M tokens
Output$2.00 / 1M tokens (Devstral 2); $0.30 / 1M tokens (Devstral Small 2) per 1M tokens

Mistral API list prices; a free trial period was offered at launch.

Pricing source ↗

Strengths

  • Frontier open-weights coding scores: 72.2% (Devstral 2) and 68.0% (Devstral Small 2) on SWE-Bench Verified
  • Devstral Small 2 (24B) runs locally on consumer hardware while staying near far larger models
  • Large 256K-token context for whole-repository and multi-file agentic edits
  • Open weights on Hugging Face — Apache 2.0 for Small 2, Modified MIT for the 123B model
  • Low API pricing and a free trial period via the Mistral API, plus the open-source Mistral Vibe CLI

Best for

  • Agentic software engineering: exploring codebases and editing multiple files
  • Terminal-native coding agents via the Mistral Vibe CLI or OpenHands-style scaffolds
  • Self-hosted, privacy-sensitive coding assistants (Devstral Small 2 on a single workstation)
  • Cost-efficient code generation and refactoring through the Mistral API

How to access

ProviderModel ID
Mistral AI ↗devstral-2-25-12
Mistral AI ↗devstral-small-2-25-12
OpenRouter ↗mistralai/devstral-2512

Devstral — every version

The full lineage of the Devstral line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Devstral 2 / Devstral Small 2current2025-12-09256KDevstral 2: Modified MIT; Devstral Small 2: Apache 2.0
Devstral Medium & Small 1.1 (25.07)2025-07-10128KDevstral Small 1.1: Apache 2.0; Devstral Medium: proprietary (API only)
Devstral Small (25.05)2025-05-21128KApache-2.0

FAQ

What is the difference between Devstral 2 and Devstral Small 2?

Devstral 2 is the 123-billion-parameter flagship that scores 72.2% on SWE-Bench Verified and ships under a Modified MIT license. Devstral Small 2 is a 24-billion-parameter model scoring 68.0% on SWE-Bench Verified, runs on consumer hardware, and is released under the permissive Apache 2.0 license. Both share a 256K context window and a dense Transformer architecture.

Is Devstral 2 open weights, and what does the Modified MIT license allow?

Yes, both models have open weights on Hugging Face. Devstral Small 2 is Apache 2.0. The 123B Devstral 2 uses a Modified MIT license that withholds rights from any company whose global consolidated monthly revenue exceeds $20 million for the preceding month, who must obtain a separate commercial license from Mistral.

How much does Devstral 2 cost on the Mistral API?

Per Mistral's pricing page, Devstral 2 is $0.40 per million input tokens and $2.00 per million output tokens, while Devstral Small 2 is $0.10 per million input and $0.30 per million output. Mistral offered a free trial period at launch.

Can Devstral Small 2 run locally?

Yes. Devstral Small 2 has 24B parameters, ships in FP8, and is supported by vLLM, SGLang, Transformers, Ollama, LM Studio, and llama.cpp, making it deployable on a single high-memory workstation for private, agentic coding.