AI/TLDR

Ministral 3B / 8B (24.10)

Mistral AI's first edge-optimized small models, built for on-device and privacy-first inference.

Overview

Ministral 3B and Ministral 8B, collectively nicknamed "les Ministraux," are two small language models that Mistral AI released on October 9, 2024 (and announced publicly on October 16). They were Mistral's first models purpose-built for the sub-10-billion-parameter edge tier — small enough to run on phones, laptops, tablets and IoT hardware — and were positioned for local, privacy-first uses such as on-device translation, offline smart assistants, local analytics and autonomous robotics.

Both models ship with a 128,000-token context window (vLLM was initially capped at 32k because the interleaved sliding-window attention kernels were not yet implemented there), native function calling, and training on a large proportion of multilingual and code data across ten languages: English, French, German, Spanish, Italian, Portuguese, Chinese, Japanese, Russian and Korean. The Ministral-8B-Instruct-2410 weights are openly downloadable on Hugging Face under the Mistral Research License, while Ministral 3B was offered only through the API under a commercial license.

The Ministraux were retired relatively quickly: Mistral marked both ministral-3b-2410 and ministral-8b-2410 as deprecated on December 2, 2025 and removed them from la Plateforme at the end of December 2025, pointing users to the newer Ministral 3 (25.12) generation. The open 8B weights, however, live on for self-hosted research and remain a popular small base for fine-tuning.

Released2024-10-09
LicenseMinistral 8B: Mistral Research License (open weights for research; commercial use requires a separate Mistral commercial license). Ministral 3B: Mistral Commercial License only — weights were not released openly.
WeightsOpen weights
ParametersMinistral 3B: ~3B. Ministral 8B: ~8B (8,019,808,256 parameters).
Context128K tokens
Max outputNot separately published by Mistral; output shares the 128K-token window.
ArchitectureDense decoder-only Transformer. The 8B has 36 layers, 32 attention heads, 8 key/value heads (grouped-query attention), 4096 embedding / 12288 hidden dimension, 128 head dimension, and a 131,072-token V3-Tekken vocabulary. Both models use an interleaved sliding-window attention pattern for faster, more memory-efficient long-context inference.
Knowledge cutoffNot officially disclosed by Mistral AI.
Modalitiestext input, text output
StatusDeprecated on Mistral's la Plateforme as of December 2, 2025 and retired from the API December 31, 2025; superseded by the Ministral 3 (25.12) family. The open Ministral-8B-Instruct-2410 weights remain available on Hugging Face for research use.

Benchmarks

  1. MMLU (Ministral 3B, base)60.9%
  2. MMLU (Ministral 8B, base)65%
  3. HumanEval pass@1 (Ministral 8B Instruct)76.8%
  4. Math maj@1 (Ministral 8B Instruct)54.5%
  5. MBPP pass@1 (Ministral 8B Instruct)70%
  6. Arena Hard (Ministral 8B Instruct)70.9%
  7. Wild Bench (Ministral 8B Instruct)41.3%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.04 / M tokens (3B); $0.10 / M tokens (8B) per 1M tokens (input and output priced identically)
Output$0.04 / M tokens (3B); $0.10 / M tokens (8B) per 1M tokens (input and output priced identically)

Launch pricing on Mistral's la Plateforme; both models were retired from the API at the end of December 2025.

Pricing source ↗

Strengths

  • Strong accuracy for its size — Ministral 3B scored 60.9 on MMLU and Ministral 8B 65.0, beating Gemma 2 2B, Llama 3.2 3B and edging Llama 3.1 8B on knowledge benchmarks
  • 128k-token context with interleaved sliding-window attention for memory-efficient long-context inference
  • Native function calling, making the 8B a capable agent backbone for input parsing, task routing and API calls
  • Designed to run locally on edge hardware for privacy-first, offline-capable inference
  • Open Ministral-8B-Instruct-2410 weights on Hugging Face for research and fine-tuning
  • Very low API pricing during its lifetime — $0.04/M (3B) and $0.10/M (8B)

Best for

  • On-device and edge inference on phones, laptops and IoT devices
  • Privacy-first, offline-capable assistants and on-device translation
  • Local analytics and lightweight text processing without cloud round-trips
  • Agentic workflows using function calling for task routing and API orchestration
  • Cost-sensitive, high-volume text generation where a small model suffices
  • A small open base (8B) for research and domain fine-tuning

How to access

ProviderModel ID
Mistral AI (la Plateforme) ↗ministral-3b-2410
Mistral AI (la Plateforme) ↗ministral-8b-2410

Ministral — every version

The full lineage of the Ministral line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Ministral 3 (3B / 8B / 14B)current2025-12-02Apache-2.0
Ministral 3B / 8B (24.10)2024-10-09Open weights

FAQ

Are the Ministral 3B / 8B (24.10) models still available?

No, not on Mistral's API. Both ministral-3b-2410 and ministral-8b-2410 were deprecated on December 2, 2025 and retired from la Plateforme at the end of December 2025, replaced by the Ministral 3 (25.12) family. However, the open Ministral-8B-Instruct-2410 weights remain downloadable on Hugging Face for research and self-hosting.

What is the difference between Ministral 3B and Ministral 8B?

They differ in size and openness. Ministral 3B has about 3 billion parameters and was API-only under a commercial license; Ministral 8B has about 8 billion parameters, scores higher on benchmarks (MMLU 65.0 vs 60.9), and its instruct weights were released openly under the Mistral Research License. Both share the same 128k context window and function-calling support.

Were the Ministral weights open source?

Partly. The Ministral-8B-Instruct-2410 weights are openly available on Hugging Face, but under the Mistral Research License — free for research, while commercial use requires a separate Mistral license. Ministral 3B's weights were never released openly; it was offered only via the API under a commercial license.

What were the Ministral models designed for?

Edge and on-device inference. Mistral built them for the sub-10B tier so they could run locally on phones, laptops and IoT hardware, targeting privacy-first use cases like on-device translation, offline smart assistants, local analytics and autonomous robotics, plus agentic workflows via native function calling.