Mistral Small 4

Name: Mistral Small 4
Author: Mistral AI

Mistral's open-weight 119B MoE that fuses chat, reasoning, and coding into one model.

Overview

Mistral Small 4 is Mistral AI's March 2026 update to its Small line and the current flagship of that series. Released on 16 March 2026 under the permissive Apache 2.0 license, it is the first Mistral model to fold the company's separate specialist families into one set of weights: instruct-style chat, the reasoning behaviour previously shipped as Magistral, the vision understanding of Pixtral, and the agentic coding of Devstral. Instead of switching models, you switch a single reasoning_effort flag.

Under the hood, Mistral Small 4 is a Mixture-of-Experts model with 128 experts and 4 active per token, totalling 119B parameters but activating only about 6.5B per token. It handles text and image input, returns text, and supports a 256k-token context window. A per-request reasoning_effort parameter lets developers trade latency for depth: "none" returns fast answers comparable to Mistral Small 3.2, while "high" produces verbose step-by-step reasoning.

The open weights ship on Hugging Face as mistralai/Mistral-Small-4-119B-2603 and run on vLLM, llama.cpp, SGLang, Transformers, and NVIDIA NIM. The hosted version is served through Mistral's La Plateforme API as model ID mistral-small-2603 at $0.15 per million input tokens and $0.60 per million output tokens, and you can try it interactively in Le Chat.

Released	2026-03-16
License	Apache 2.0
Weights	Open weights
Parameters	119B total / 6.5B active (MoE)
Context	256K
Architecture	Mixture-of-Experts (MoE) with 128 experts and 4 active per token; 119B total parameters, ~6.5B active per token. Accepts text and image input and returns text. Exposes a per-request reasoning_effort control ("none" for fast Small-3.2-style responses, "high" for step-by-step reasoning).
Knowledge cutoff	Not disclosed
Modalities	Text, Vision
Status	Available

Benchmarks

Grouped bar chart titled 'Performance comparison across internal models' comparing Mistral Small 4 (Instruct + Reasoning) against Mistral Small 3.2, Mistral Medium 3.1 and Mistral Large 3 on text benchmarks (GPQA Diamond, MMLU Pro, AllenAI IFBench, Arena Hard) and a vision benchmark (MMMU-Pro). — Mistral Small 4 vs other Mistral models on text and vision benchmarks. — Mistral AI

Grouped bar chart titled 'Performance comparison across internal models' comparing Mistral Small 4 - High against Magistral Medium 1.2 and Magistral Small 1.2 on LCR, AIME25, Collie and LiveCodeBench. — Mistral Small 4 - High vs Magistral 1.2 models on reasoning and coding benchmarks. — Mistral AI

Performance comparison across internal models: Mistral Small 4 - High vs Magistral 1.2 models.

Benchmark	Mistral Small 4 - High	Magistral Medium 1.2	Magistral Small 1.2
LCR	71.2 score	73 score	27 score
AIME25	83.8 score	84.4 score	80.2 score
Collie	62.9 score	61.3 score	60.3 score
LiveCodeBench	63.6 score	66.1 score	60.7 score

Comparison source ↗

This model's scores

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.15 / 1M tokens per 1M tokens
Output	$0.60 / 1M tokens per 1M tokens

Hosted on Mistral's La Plateforme as model ID mistral-small-2603. Open weights are free to self-host under Apache 2.0.

Pricing source ↗

Strengths

Apache 2.0 open weights — free to self-host, fine-tune, and use commercially
One model for chat, reasoning, vision, and agentic coding instead of separate specialist checkpoints
Toggleable reasoning_effort (none/high) trades latency for depth on a per-request basis
Sparse 119B MoE activates only ~6.5B parameters per token, keeping inference efficient
Long 256k-token context window for big documents and codebases
Concise outputs — competitive scores while emitting far fewer tokens than rivals (e.g. 0.72 AA LCR at ~1.6K characters)

Best for

Self-hosted assistants and agents where an open, commercially-usable license matters
Agentic coding workflows that need tool calling, structured output, and concise code
Multimodal document and image understanding over long contexts
Cost-sensitive reasoning tasks where you toggle deep thinking only when needed
Fine-tuning a single base for chat, reasoning, and coding without maintaining multiple models

How to access

Provider	Model ID
Mistral AI (La Plateforme) ↗	`mistral-small-2603`
OpenRouter ↗	`mistralai/mistral-small-2603`

Mistral Small — every version

The full lineage of the Mistral Small line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Mistral Small 4current	2026-03-16	—	Apache-2.0
Mistral Small 3.2	2025-06-20	—	Apache-2.0
Mistral Small 3.1	2025-03-17	—	Open weights
Mistral Small 3	2025-01-30	—	Apache-2.0
Mistral Small (24.09)	2024-09-17	—	Open weights

FAQ

Is Mistral Small 4 open source?

The weights are released under the Apache 2.0 license, so you can download, self-host, fine-tune, and use Mistral Small 4 commercially for free. They ship on Hugging Face as mistralai/Mistral-Small-4-119B-2603.

How big is Mistral Small 4?

It is a Mixture-of-Experts model with 119B total parameters and 128 experts, but only 4 experts (about 6.5B parameters) are active per token, which keeps inference efficient relative to its total size.

What is the reasoning_effort parameter?

Mistral Small 4 takes a per-request reasoning_effort setting. "none" returns fast answers comparable to Mistral Small 3.2, while "high" produces step-by-step reasoning. This lets one model cover both quick chat and deeper problem-solving.

How much does Mistral Small 4 cost via the API?

On Mistral's hosted API (model ID mistral-small-2603) it is priced at $0.15 per million input tokens and $0.60 per million output tokens. Self-hosting the open weights is free under Apache 2.0.

// Overview

// Benchmarks

This model's scores

// Pricing

// Strengths

// Best for

// How to access

// Mistral Small — every version

// FAQ