Mistral Small 3

Name: Mistral Small 3
Author: Mistral AI

Latency-optimized 24B open-weight model under Apache 2.0, built for fast local inference.

Overview

Mistral Small 3 is a 24-billion-parameter dense language model that Mistral AI released on 30 January 2025 under the permissive Apache 2.0 license. Its instruction-tuned checkpoint carries the API identifier mistral-small-2501. Mistral positioned it as a latency-optimized, locally deployable model that is competitive with much larger systems such as Llama 3.3 70B and Qwen 32B while being more than three times faster on the same hardware.

The model targets the practical sweet spot for running a capable LLM on a single consumer GPU. When quantized, Mistral Small 3 fits on an RTX 4090 or a 32GB MacBook, and the Ollama build is roughly a 14GB download. Mistral pitched it as an open, transparent replacement for closed lightweight models like GPT-4o-mini, with native function calling, JSON output and strong system-prompt adherence aimed at agentic and conversational workloads. It is text-only, with a 32k context window and an October 2023 knowledge cutoff.

Mistral Small 3 was the first model in the line where Mistral shifted away from its proprietary research license back to fully open Apache 2.0 weights. It has since been superseded on Mistral's platform by Small 3.1, Small 3.2 and Small 4, but because the weights were released openly they remain available for download and self-hosting via Hugging Face, Ollama and Kaggle.

Released	2025-01-30
License	Apache-2.0
Weights	Open weights
Parameters	24B
Context	32k tokens
Max output	Not separately published by Mistral; bounded by the 32k context window
Architecture	Dense decoder-only transformer (24B parameters), not a mixture-of-experts. Mistral designed it with far fewer layers than competing models to cut the time per forward pass, which is the source of its low latency. It uses the Tekken tokenizer with a 131k-token vocabulary. Two checkpoints were released: a base pretrained model and an instruction-tuned model (mistral-small-2501).
Knowledge cutoff	October 2023
Modalities	Text
Status	Superseded. Released 2025-01-30 as the instruction-tuned id mistral-small-2501; later replaced on Mistral's API by Small 3.1 (March 2025), Small 3.2 (June 2025) and Small 4 (2026). The Apache 2.0 weights remain freely downloadable.

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.10 / 1M tokens per 1M tokens
Output	$0.30 / 1M tokens per 1M tokens

Launch pricing for mistral-small-2501 on Mistral's la Plateforme (30 Jan 2025) — half the previous Mistral Small rate of $0.20/$0.60. Self-hosting the Apache 2.0 weights is free. Later pricing may differ.

Pricing source ↗

Strengths

Low latency for its capability tier — roughly 150 tokens/s and 3x faster than Llama 3.3 70B on the same hardware
Apache 2.0 license with both base and instruction-tuned weights freely available for commercial use and fine-tuning
Runs locally on a single consumer GPU (RTX 4090) or a 32GB MacBook when quantized
Native function calling and JSON output, suited to agentic and tool-use pipelines
Strong multilingual coverage across dozens of languages including French, German, Spanish, Chinese, Japanese and Korean
Reliable system-prompt adherence for controllable assistants

Best for

Fast-response conversational assistants and chatbots
Low-latency function calling inside agent workflows
Fine-tuning into domain or subject-matter experts (e.g. legal, medical, technical support)
Fully local or on-device inference for privacy-sensitive data
Fraud detection, customer triaging and sentiment workflows that need quick throughput
An open, self-hostable replacement for proprietary lightweight models

How to access

Provider	Model ID
Mistral AI (la Plateforme) ↗	`mistral-small-2501`
Hugging Face ↗	`mistralai/Mistral-Small-24B-Instruct-2501`

Mistral Small — every version

The full lineage of the Mistral Small line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Mistral Small 4current	2026-03-16	—	Apache-2.0
Mistral Small 3.2	2025-06-20	—	Apache-2.0
Mistral Small 3.1	2025-03-17	—	Open weights
Mistral Small 3	2025-01-30	—	Apache-2.0
Mistral Small (24.09)	2024-09-17	—	Open weights

FAQ

What is Mistral Small 3?

Mistral Small 3 is a 24-billion-parameter dense, text-only language model from Mistral AI, released on 30 January 2025 under the Apache 2.0 license. Its instruction-tuned checkpoint is identified as mistral-small-2501. It is built for low-latency, locally deployable general-purpose use.

Is Mistral Small 3 open source?

Yes. Both the base pretrained model and the instruction-tuned model were released under the Apache 2.0 license, which allows free commercial use, modification and fine-tuning. The weights are available on Hugging Face, Ollama and Kaggle.

How big is Mistral Small 3 and can it run locally?

It has 24B parameters in a dense architecture with a 32k context window. When quantized it fits on a single RTX 4090 GPU or a 32GB MacBook, and the Ollama build is about a 14GB download, making fully local inference practical.

How does Mistral Small 3 perform versus larger models?

Mistral reports it is competitive with Llama 3.3 70B and Qwen 32B while running more than 3x faster on the same hardware. It scores about 81% on MMLU, 84.8% on HumanEval and 70.6% on MATH, despite being roughly a third of the size of those rivals.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Mistral Small — every version

// FAQ