Mistral 7B

Name: Mistral 7B
Author: Mistral AI

Mistral AI's debut open model: a 7.3B Apache-2.0 transformer that beat Llama 2 13B

Overview

Mistral 7B is Mistral AI's first model, released on 2023-09-27 under the Apache 2.0 license with fully downloadable weights. It is a 7.3-billion-parameter dense transformer that, in the launch paper (arXiv 2310.06825), outperformed Meta's Llama 2 13B on every benchmark the authors tested and matched or beat the much larger Llama 1 34B on reasoning, mathematics, and code. That result — a 7B model beating a 13B one — is what put Mistral AI on the map.

Architecturally, Mistral 7B introduced two efficiency tricks that later became standard across the field: grouped-query attention (GQA), which shrinks the KV cache and speeds up decoding, and sliding-window attention (SWA) with a 4,096-token window, which lets each layer attend locally while information still propagates across stacked layers. The original v0.1 release shipped with an 8K context; the later v0.2 and v0.3 weight updates raised the effective context to 32K, and v0.3 extended the vocabulary to 32,768 tokens and added function-calling support.

Mistral 7B is now a legacy model. On Mistral's hosted API (la Plateforme) the open-mistral-7b endpoint was deprecated on 2024-11-30 and retired on 2025-03-30, with Ministral 8B named as its successor. The open weights remain freely available on Hugging Face and run locally through Ollama, llama.cpp, vLLM, and LM Studio, so the model lives on as a lightweight, edge-friendly base for fine-tuning even though Mistral no longer serves it directly.

Released	2023-09-27
License	Apache-2.0
Weights	Open weights
Parameters	7.3B
Context	8K (v0.1); 32K (v0.2 / v0.3)
Architecture	Dense transformer (32 layers, dim 4096, 32 heads / 8 KV heads) with grouped-query attention (GQA) and sliding-window attention (window 4096)
Knowledge cutoff	Not officially disclosed
Modalities	Text
Status	Deprecated — API model open-mistral-7b deprecated 2024-11-30, retired 2025-03-30; weights still downloadable on Hugging Face

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Strengths

Small enough to run on a single consumer GPU (or quantized on a laptop) while outscoring the 2x-larger Llama 2 13B across the launch-paper benchmark suite
Permissive Apache 2.0 license with no usage restrictions — free to fine-tune, redistribute, and deploy commercially
Efficient inference from grouped-query attention plus sliding-window attention, lowering memory and latency versus a vanilla 7B transformer
Strong math and code for its size: 52.2% GSM8K and 30.5% HumanEval in the paper, approaching Code-Llama 7B on code without sacrificing general performance
Huge ecosystem support (Hugging Face, Ollama, vLLM, llama.cpp, LM Studio) and a vast library of community fine-tunes built on it

Best for

A lightweight, locally-deployable base model for fine-tuning on domain-specific tasks
On-device and edge assistants where a small, fast, permissively-licensed model is required
Cost-sensitive chat, summarization, and classification workloads that don't need a frontier model
A teaching and research baseline for studying GQA and sliding-window attention
Self-hosted inference for privacy-sensitive deployments that cannot send data to a hosted API

How to access

Provider	Model ID
Hugging Face (open weights) ↗	`mistralai/Mistral-7B-Instruct-v0.3`
Ollama (local) ↗	`mistral:7b`
OpenRouter ↗	`mistralai/mistral-7b-instruct-v0.3`

Mistral 7B / Nemo (open dense) — every version

The full lineage of the Mistral 7B / Nemo (open dense) line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Mistral NeMo (12B)current	2024-07-18	—	Apache-2.0
Mistral 7B	2023-09-27	—	Apache-2.0

FAQ

Is Mistral 7B still available?

The open weights are still freely available under Apache 2.0 on Hugging Face and run locally via Ollama, vLLM, llama.cpp, and LM Studio. However, Mistral's hosted API endpoint open-mistral-7b was deprecated on 2024-11-30 and retired on 2025-03-30; Mistral points API users to Ministral 8B as the successor.

How big is Mistral 7B and what license does it use?

It has 7.3 billion parameters and is released under the permissive Apache 2.0 license, which allows unrestricted commercial use, fine-tuning, and redistribution.

What is its context window?

The original v0.1 release supported an 8K-token context. The later v0.2 and v0.3 weight updates extended the effective context to 32K tokens, and v0.3 also grew the vocabulary to 32,768 tokens and added function calling.

How does Mistral 7B compare to Llama 2 13B?

In the launch paper (arXiv 2310.06825, Table 2), Mistral 7B outscored Llama 2 13B on every benchmark tested despite being roughly half the size — for example 60.1% vs 55.6% on MMLU and 52.2% vs 34.3% on GSM8K — thanks in part to grouped-query attention and sliding-window attention.

// Overview

// Benchmarks

// Strengths

// Best for

// How to access

// Mistral 7B / Nemo (open dense) — every version

// FAQ