AI/TLDR

Mistral Small 3

Latency-optimized 24B open-weight model under Apache 2.0, built for fast local inference.

Overview

Mistral Small 3 is a 24-billion-parameter dense language model that Mistral AI released on 30 January 2025 under the permissive Apache 2.0 license. Its instruction-tuned checkpoint carries the API identifier mistral-small-2501. Mistral positioned it as a latency-optimized, locally deployable model that is competitive with much larger systems such as Llama 3.3 70B and Qwen 32B while being more than three times faster on the same hardware.

The model targets the practical sweet spot for running a capable LLM on a single consumer GPU. When quantized, Mistral Small 3 fits on an RTX 4090 or a 32GB MacBook, and the Ollama build is roughly a 14GB download. Mistral pitched it as an open, transparent replacement for closed lightweight models like GPT-4o-mini, with native function calling, JSON output and strong system-prompt adherence aimed at agentic and conversational workloads. It is text-only, with a 32k context window and an October 2023 knowledge cutoff.

Mistral Small 3 was the first model in the line where Mistral shifted away from its proprietary research license back to fully open Apache 2.0 weights. It has since been superseded on Mistral's platform by Small 3.1, Small 3.2 and Small 4, but because the weights were released openly they remain available for download and self-hosting via Hugging Face, Ollama and Kaggle.

Released2025-01-30
LicenseApache-2.0
WeightsOpen weights
Parameters24B
Context32k tokens
Max outputNot separately published by Mistral; bounded by the 32k context window
ArchitectureDense decoder-only transformer (24B parameters), not a mixture-of-experts. Mistral designed it with far fewer layers than competing models to cut the time per forward pass, which is the source of its low latency. It uses the Tekken tokenizer with a 131k-token vocabulary. Two checkpoints were released: a base pretrained model and an instruction-tuned model (mistral-small-2501).
Knowledge cutoffOctober 2023
ModalitiesText
StatusSuperseded. Released 2025-01-30 as the instruction-tuned id mistral-small-2501; later replaced on Mistral's API by Small 3.1 (March 2025), Small 3.2 (June 2025) and Small 4 (2026). The Apache 2.0 weights remain freely downloadable.

Benchmarks

  1. MMLU (accuracy)81%
  2. MMLU Pro (5-shot CoT)66.3%
  3. GPQA Main (5-shot CoT)45.3%
  4. HumanEval (Pass@1)84.8%
  5. MATH70.6%
  6. IFEval (instruction following)82.9%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.10 / 1M tokens per 1M tokens
Output$0.30 / 1M tokens per 1M tokens

Launch pricing for mistral-small-2501 on Mistral's la Plateforme (30 Jan 2025) — half the previous Mistral Small rate of $0.20/$0.60. Self-hosting the Apache 2.0 weights is free. Later pricing may differ.

Pricing source ↗

Strengths

  • Low latency for its capability tier — roughly 150 tokens/s and 3x faster than Llama 3.3 70B on the same hardware
  • Apache 2.0 license with both base and instruction-tuned weights freely available for commercial use and fine-tuning
  • Runs locally on a single consumer GPU (RTX 4090) or a 32GB MacBook when quantized
  • Native function calling and JSON output, suited to agentic and tool-use pipelines
  • Strong multilingual coverage across dozens of languages including French, German, Spanish, Chinese, Japanese and Korean
  • Reliable system-prompt adherence for controllable assistants

Best for

  • Fast-response conversational assistants and chatbots
  • Low-latency function calling inside agent workflows
  • Fine-tuning into domain or subject-matter experts (e.g. legal, medical, technical support)
  • Fully local or on-device inference for privacy-sensitive data
  • Fraud detection, customer triaging and sentiment workflows that need quick throughput
  • An open, self-hostable replacement for proprietary lightweight models

How to access

ProviderModel ID
Mistral AI (la Plateforme) ↗mistral-small-2501
Hugging Face ↗mistralai/Mistral-Small-24B-Instruct-2501

Mistral Small — every version

The full lineage of the Mistral Small line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Mistral Small 4current2026-03-16Apache-2.0
Mistral Small 3.22025-06-20Apache-2.0
Mistral Small 3.12025-03-17Open weights
Mistral Small 32025-01-30Apache-2.0
Mistral Small (24.09)2024-09-17Open weights

FAQ

What is Mistral Small 3?

Mistral Small 3 is a 24-billion-parameter dense, text-only language model from Mistral AI, released on 30 January 2025 under the Apache 2.0 license. Its instruction-tuned checkpoint is identified as mistral-small-2501. It is built for low-latency, locally deployable general-purpose use.

Is Mistral Small 3 open source?

Yes. Both the base pretrained model and the instruction-tuned model were released under the Apache 2.0 license, which allows free commercial use, modification and fine-tuning. The weights are available on Hugging Face, Ollama and Kaggle.

How big is Mistral Small 3 and can it run locally?

It has 24B parameters in a dense architecture with a 32k context window. When quantized it fits on a single RTX 4090 GPU or a 32GB MacBook, and the Ollama build is about a 14GB download, making fully local inference practical.

How does Mistral Small 3 perform versus larger models?

Mistral reports it is competitive with Llama 3.3 70B and Qwen 32B while running more than 3x faster on the same hardware. It scores about 81% on MMLU, 84.8% on HumanEval and 70.6% on MATH, despite being roughly a third of the size of those rivals.