AI/TLDR

Mistral Small 4

Mistral's open-weight 119B MoE that fuses chat, reasoning, and coding into one model.

Overview

Mistral Small 4 is Mistral AI's March 2026 update to its Small line and the current flagship of that series. Released on 16 March 2026 under the permissive Apache 2.0 license, it is the first Mistral model to fold the company's separate specialist families into one set of weights: instruct-style chat, the reasoning behaviour previously shipped as Magistral, the vision understanding of Pixtral, and the agentic coding of Devstral. Instead of switching models, you switch a single reasoning_effort flag.

Under the hood, Mistral Small 4 is a Mixture-of-Experts model with 128 experts and 4 active per token, totalling 119B parameters but activating only about 6.5B per token. It handles text and image input, returns text, and supports a 256k-token context window. A per-request reasoning_effort parameter lets developers trade latency for depth: "none" returns fast answers comparable to Mistral Small 3.2, while "high" produces verbose step-by-step reasoning.

The open weights ship on Hugging Face as mistralai/Mistral-Small-4-119B-2603 and run on vLLM, llama.cpp, SGLang, Transformers, and NVIDIA NIM. The hosted version is served through Mistral's La Plateforme API as model ID mistral-small-2603 at $0.15 per million input tokens and $0.60 per million output tokens, and you can try it interactively in Le Chat.

Released2026-03-16
LicenseApache 2.0
WeightsOpen weights
Parameters119B total / 6.5B active (MoE)
Context256K
ArchitectureMixture-of-Experts (MoE) with 128 experts and 4 active per token; 119B total parameters, ~6.5B active per token. Accepts text and image input and returns text. Exposes a per-request reasoning_effort control ("none" for fast Small-3.2-style responses, "high" for step-by-step reasoning).
Knowledge cutoffNot disclosed
ModalitiesText, Vision
StatusAvailable

Benchmarks

Grouped bar chart titled 'Performance comparison across internal models' comparing Mistral Small 4 (Instruct + Reasoning) against Mistral Small 3.2, Mistral Medium 3.1 and Mistral Large 3 on text benchmarks (GPQA Diamond, MMLU Pro, AllenAI IFBench, Arena Hard) and a vision benchmark (MMMU-Pro).
Mistral Small 4 vs other Mistral models on text and vision benchmarks. — Mistral AI
Grouped bar chart titled 'Performance comparison across internal models' comparing Mistral Small 4 - High against Magistral Medium 1.2 and Magistral Small 1.2 on LCR, AIME25, Collie and LiveCodeBench.
Mistral Small 4 - High vs Magistral 1.2 models on reasoning and coding benchmarks. — Mistral AI

Performance comparison across internal models: Mistral Small 4 - High vs Magistral 1.2 models.

BenchmarkMistral Small 4 - HighMagistral Medium 1.2Magistral Small 1.2
LCR71.2 score73 score27 score
AIME2583.8 score84.4 score80.2 score
Collie62.9 score61.3 score60.3 score
LiveCodeBench63.6 score66.1 score60.7 score

Comparison source ↗

This model's scores

  1. GPQA Diamond71.2%
  2. AA LCR (long-context reasoning)0.72%
  3. AIME 2025 (reasoning mode)93%
  4. LiveCodeBench64%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.15 / 1M tokens per 1M tokens
Output$0.60 / 1M tokens per 1M tokens

Hosted on Mistral's La Plateforme as model ID mistral-small-2603. Open weights are free to self-host under Apache 2.0.

Pricing source ↗

Strengths

  • Apache 2.0 open weights — free to self-host, fine-tune, and use commercially
  • One model for chat, reasoning, vision, and agentic coding instead of separate specialist checkpoints
  • Toggleable reasoning_effort (none/high) trades latency for depth on a per-request basis
  • Sparse 119B MoE activates only ~6.5B parameters per token, keeping inference efficient
  • Long 256k-token context window for big documents and codebases
  • Concise outputs — competitive scores while emitting far fewer tokens than rivals (e.g. 0.72 AA LCR at ~1.6K characters)

Best for

  • Self-hosted assistants and agents where an open, commercially-usable license matters
  • Agentic coding workflows that need tool calling, structured output, and concise code
  • Multimodal document and image understanding over long contexts
  • Cost-sensitive reasoning tasks where you toggle deep thinking only when needed
  • Fine-tuning a single base for chat, reasoning, and coding without maintaining multiple models

How to access

ProviderModel ID
Mistral AI (La Plateforme) ↗mistral-small-2603
OpenRouter ↗mistralai/mistral-small-2603

Mistral Small — every version

The full lineage of the Mistral Small line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Mistral Small 4current2026-03-16Apache-2.0
Mistral Small 3.22025-06-20Apache-2.0
Mistral Small 3.12025-03-17Open weights
Mistral Small 32025-01-30Apache-2.0
Mistral Small (24.09)2024-09-17Open weights

FAQ

Is Mistral Small 4 open source?

The weights are released under the Apache 2.0 license, so you can download, self-host, fine-tune, and use Mistral Small 4 commercially for free. They ship on Hugging Face as mistralai/Mistral-Small-4-119B-2603.

How big is Mistral Small 4?

It is a Mixture-of-Experts model with 119B total parameters and 128 experts, but only 4 experts (about 6.5B parameters) are active per token, which keeps inference efficient relative to its total size.

What is the reasoning_effort parameter?

Mistral Small 4 takes a per-request reasoning_effort setting. "none" returns fast answers comparable to Mistral Small 3.2, while "high" produces step-by-step reasoning. This lets one model cover both quick chat and deeper problem-solving.

How much does Mistral Small 4 cost via the API?

On Mistral's hosted API (model ID mistral-small-2603) it is priced at $0.15 per million input tokens and $0.60 per million output tokens. Self-hosting the open weights is free under Apache 2.0.