Ministral 3B / 8B (24.10)

Name: Ministral 3B / 8B (24.10)
Author: Mistral AI

Mistral AI's first edge-optimized small models, built for on-device and privacy-first inference.

Overview

Ministral 3B and Ministral 8B, collectively nicknamed "les Ministraux," are two small language models that Mistral AI released on October 9, 2024 (and announced publicly on October 16). They were Mistral's first models purpose-built for the sub-10-billion-parameter edge tier — small enough to run on phones, laptops, tablets and IoT hardware — and were positioned for local, privacy-first uses such as on-device translation, offline smart assistants, local analytics and autonomous robotics.

Both models ship with a 128,000-token context window (vLLM was initially capped at 32k because the interleaved sliding-window attention kernels were not yet implemented there), native function calling, and training on a large proportion of multilingual and code data across ten languages: English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Russian and Korean. The Ministral-8B-Instruct-2410 weights are openly downloadable on Hugging Face under the Mistral Research License, while Ministral 3B was offered only through the API under a commercial license.

The Ministraux were retired relatively quickly: Mistral marked both ministral-3b-2410 and ministral-8b-2410 as deprecated on December 2, 2025 and removed them from la Plateforme at the end of December 2025, pointing users to the newer Ministral 3 (25.12) generation. The open 8B weights, however, live on for self-hosted research and remain a popular small base for fine-tuning.

Released	2024-10-09
License	Ministral 8B: Mistral Research License (open weights for research; commercial use requires a separate Mistral commercial license). Ministral 3B: Mistral Commercial License only — weights were not released openly.
Weights	Open weights
Parameters	Ministral 3B: ~3B. Ministral 8B: ~8B (8,019,808,256 parameters).
Context	128K tokens
Max output	Not separately published by Mistral; output shares the 128K-token window.
Architecture	Dense decoder-only Transformer. The 8B has 36 layers, 32 attention heads, 8 key/value heads (grouped-query attention), 4096 embedding / 12288 hidden dimension, 128 head dimension, and a 131,072-token V3-Tekken vocabulary. Both models use an interleaved sliding-window attention pattern for faster, more memory-efficient long-context inference.
Knowledge cutoff	Not officially disclosed by Mistral AI.
Modalities	text input, text output
Status	Deprecated on Mistral's la Plateforme as of December 2, 2025 and retired from the API December 31, 2025; superseded by the Ministral 3 (25.12) family. The open Ministral-8B-Instruct-2410 weights remain available on Hugging Face for research use.

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.04 / M tokens (3B); $0.10 / M tokens (8B) per 1M tokens (input and output priced identically)
Output	$0.04 / M tokens (3B); $0.10 / M tokens (8B) per 1M tokens (input and output priced identically)

Launch pricing on Mistral's la Plateforme; both models were retired from the API at the end of December 2025.

Pricing source ↗

Strengths

Strong accuracy for its size — Ministral 3B scored 60.9 on MMLU and Ministral 8B 65.0, beating Gemma 2 2B, Llama 3.2 3B and edging Llama 3.1 8B on knowledge benchmarks
128k-token context with interleaved sliding-window attention for memory-efficient long-context inference
Native function calling, making the 8B a capable agent backbone for input parsing, task routing and API calls
Designed to run locally on edge hardware for privacy-first, offline-capable inference
Open Ministral-8B-Instruct-2410 weights on Hugging Face for research and fine-tuning
Very low API pricing during its lifetime — $0.04/M (3B) and $0.10/M (8B)

Best for

On-device and edge inference on phones, laptops and IoT devices
Privacy-first, offline-capable assistants and on-device translation
Local analytics and lightweight text processing without cloud round-trips
Agentic workflows using function calling for task routing and API orchestration
Cost-sensitive, high-volume text generation where a small model suffices
A small open base (8B) for research and domain fine-tuning

How to access

Provider	Model ID
Mistral AI (la Plateforme) ↗	`ministral-3b-2410`
Mistral AI (la Plateforme) ↗	`ministral-8b-2410`

Ministral — every version

The full lineage of the Ministral line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Ministral 3 (3B / 8B / 14B)current	2025-12-02	—	Apache-2.0
Ministral 3B / 8B (24.10)	2024-10-09	—	Open weights

FAQ

Are the Ministral 3B / 8B (24.10) models still available?

No, not on Mistral's API. Both ministral-3b-2410 and ministral-8b-2410 were deprecated on December 2, 2025 and retired from la Plateforme at the end of December 2025, replaced by the Ministral 3 (25.12) family. However, the open Ministral-8B-Instruct-2410 weights remain downloadable on Hugging Face for research and self-hosting.

What is the difference between Ministral 3B and Ministral 8B?

They differ in size and openness. Ministral 3B has about 3 billion parameters and was API-only under a commercial license; Ministral 8B has about 8 billion parameters, scores higher on benchmarks (MMLU 65.0 vs 60.9), and its instruct weights were released openly under the Mistral Research License. Both share the same 128k context window and function-calling support.

Were the Ministral weights open source?

Partly. The Ministral-8B-Instruct-2410 weights are openly available on Hugging Face, but under the Mistral Research License — free for research, while commercial use requires a separate Mistral license. Ministral 3B's weights were never released openly; it was offered only via the API under a commercial license.

What were the Ministral models designed for?

Edge and on-device inference. Mistral built them for the sub-10B tier so they could run locally on phones, laptops and IoT hardware, targeting privacy-first use cases like on-device translation, offline smart assistants, local analytics and autonomous robotics, plus agentic workflows via native function calling.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Ministral — every version

// FAQ