Overview
Mistral 7B is Mistral AI's first model, released on 2023-09-27 under the Apache 2.0 license with fully downloadable weights. It is a 7.3-billion-parameter dense transformer that, in the launch paper (arXiv 2310.06825), outperformed Meta's Llama 2 13B on every benchmark the authors tested and matched or beat the much larger Llama 1 34B on reasoning, mathematics, and code. That result — a 7B model beating a 13B one — is what put Mistral AI on the map.
Architecturally, Mistral 7B introduced two efficiency tricks that later became standard across the field: grouped-query attention (GQA), which shrinks the KV cache and speeds up decoding, and sliding-window attention (SWA) with a 4,096-token window, which lets each layer attend locally while information still propagates across stacked layers. The original v0.1 release shipped with an 8K context; the later v0.2 and v0.3 weight updates raised the effective context to 32K, and v0.3 extended the vocabulary to 32,768 tokens and added function-calling support.
Mistral 7B is now a legacy model. On Mistral's hosted API (la Plateforme) the open-mistral-7b endpoint was deprecated on 2024-11-30 and retired on 2025-03-30, with Ministral 8B named as its successor. The open weights remain freely available on Hugging Face and run locally through Ollama, llama.cpp, vLLM, and LM Studio, so the model lives on as a lightweight, edge-friendly base for fine-tuning even though Mistral no longer serves it directly.
| Released | 2023-09-27 |
|---|---|
| License | Apache-2.0 |
| Weights | Open weights |
| Parameters | 7.3B |
| Context | 8K (v0.1); 32K (v0.2 / v0.3) |
| Architecture | Dense transformer (32 layers, dim 4096, 32 heads / 8 KV heads) with grouped-query attention (GQA) and sliding-window attention (window 4096) |
| Knowledge cutoff | Not officially disclosed |
| Modalities | Text |
| Status | Deprecated — API model open-mistral-7b deprecated 2024-11-30, retired 2025-03-30; weights still downloadable on Hugging Face |
Benchmarks
- MMLU (5-shot)60.1%
- HellaSwag (0-shot)81.3%
- WinoGrande (0-shot)75.3%
- PIQA (0-shot)83%
- ARC-Challenge (0-shot)55.5%
- TriviaQA (5-shot)69.9%
- HumanEval (0-shot)30.5%
- MBPP (3-shot)47.5%
- GSM8K (8-shot, maj@8)52.2%
- MATH (4-shot, maj@4)13.1%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Strengths
- Small enough to run on a single consumer GPU (or quantized on a laptop) while outscoring the 2x-larger Llama 2 13B across the launch-paper benchmark suite
- Permissive Apache 2.0 license with no usage restrictions — free to fine-tune, redistribute, and deploy commercially
- Efficient inference from grouped-query attention plus sliding-window attention, lowering memory and latency versus a vanilla 7B transformer
- Strong math and code for its size: 52.2% GSM8K and 30.5% HumanEval in the paper, approaching Code-Llama 7B on code without sacrificing general performance
- Huge ecosystem support (Hugging Face, Ollama, vLLM, llama.cpp, LM Studio) and a vast library of community fine-tunes built on it
Best for
- A lightweight, locally-deployable base model for fine-tuning on domain-specific tasks
- On-device and edge assistants where a small, fast, permissively-licensed model is required
- Cost-sensitive chat, summarization, and classification workloads that don't need a frontier model
- A teaching and research baseline for studying GQA and sliding-window attention
- Self-hosted inference for privacy-sensitive deployments that cannot send data to a hosted API
How to access
| Provider | Model ID |
|---|---|
| Hugging Face (open weights) ↗ | mistralai/Mistral-7B-Instruct-v0.3 |
| Ollama (local) ↗ | mistral:7b |
| OpenRouter ↗ | mistralai/mistral-7b-instruct-v0.3 |
Mistral 7B / Nemo (open dense) — every version
The full lineage of the Mistral 7B / Nemo (open dense) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| Mistral NeMo (12B)current | 2024-07-18 | — | Apache-2.0 |
| Mistral 7B | 2023-09-27 | — | Apache-2.0 |
FAQ
Is Mistral 7B still available?
The open weights are still freely available under Apache 2.0 on Hugging Face and run locally via Ollama, vLLM, llama.cpp, and LM Studio. However, Mistral's hosted API endpoint open-mistral-7b was deprecated on 2024-11-30 and retired on 2025-03-30; Mistral points API users to Ministral 8B as the successor.
How big is Mistral 7B and what license does it use?
It has 7.3 billion parameters and is released under the permissive Apache 2.0 license, which allows unrestricted commercial use, fine-tuning, and redistribution.
What is its context window?
The original v0.1 release supported an 8K-token context. The later v0.2 and v0.3 weight updates extended the effective context to 32K tokens, and v0.3 also grew the vocabulary to 32,768 tokens and added function calling.
How does Mistral 7B compare to Llama 2 13B?
In the launch paper (arXiv 2310.06825, Table 2), Mistral 7B outscored Llama 2 13B on every benchmark tested despite being roughly half the size — for example 60.1% vs 55.6% on MMLU and 52.2% vs 34.3% on GSM8K — thanks in part to grouped-query attention and sliding-window attention.