Overview
Mixtral 8x22B is an open-weight large language model from Mistral AI, released as a base model via magnet link on April 10, 2024 and formally announced on April 17, 2024 under the title "Cheaper, Better, Faster, Stronger." It is a sparse Mixture-of-Experts (SMoE) model with 141 billion total parameters, of which only about 39 billion are active for any given token. That sparse routing is the whole point of the design: it gives Mixtral 8x22B the knowledge capacity of a very large model while running at roughly the speed and cost of a much smaller dense one.
Mistral released Mixtral 8x22B under the Apache 2.0 license — one of the most permissive open-source licenses — so the weights for both the base (Mixtral-8x22B-v0.1) and instruction-tuned (Mixtral-8x22B-Instruct-v0.1) checkpoints can be downloaded, fine-tuned, and self-hosted with no usage restrictions. The model has a 64K-token context window, is natively capable of function calling, and is fluent in English, French, Italian, German, and Spanish.
At launch Mixtral 8x22B was positioned as one of the strongest open-weight models available, with particular strength in reasoning, mathematics, and coding. Mistral retired it from its hosted la Plateforme API on March 30, 2025 in favor of newer models such as Mistral Small 3.2, but the Apache 2.0 weights remain freely available on Hugging Face and through third-party hosts, so it stays a usable open model for self-hosting and research.
| Released | 2024-04-10 |
|---|---|
| License | Apache 2.0 |
| Weights | Open weights |
| Parameters | 141B total, 39B active per token |
| Context | 64K |
| Max output | 4K (la Plateforme; ~8K on some providers) |
| Architecture | Sparse Mixture-of-Experts (SMoE) transformer — 8 experts, 2 routed per token; ~39B of 141B parameters active per forward pass; BF16 weights |
| Knowledge cutoff | Not publicly disclosed by Mistral AI |
| Modalities | Text |
| Status | Retired from Mistral's la Plateforme API (March 30, 2025); open weights remain available on Hugging Face under Apache 2.0 |
Benchmarks
- MMLU77.7%
- GSM8K (maj@8, instruct)90.8%
- Math (maj@4, instruct)44.6%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $2.00 per 1M tokens per 1M tokens (Mixtral 8x22B Instruct, third-party host) |
|---|---|
| Output | $6.00 per 1M tokens per 1M tokens (Mixtral 8x22B Instruct, third-party host) |
Mistral's own la Plateforme retired the hosted endpoint on 2025-03-30; rate shown is from OpenRouter for the open-weight instruct model. Self-hosting the Apache 2.0 weights has no per-token cost.
Strengths
- Cost/performance efficiency: sparse MoE activates only ~39B of 141B parameters per token, so it runs faster and cheaper than a dense 70B+ model while retaining large-model knowledge
- Fully open under Apache 2.0 — base and instruct weights are downloadable, fine-tunable, and self-hostable with no commercial restrictions
- Strong math and coding for its era (GSM8K maj@8 90.8%, Math maj@4 44.6% on the instruct version per Mistral)
- Native function calling, useful for tool-augmented and agentic application development
- Multilingual across English, French, Italian, German, and Spanish
- 64K-token context window for working over long documents
Best for
- Self-hosted open-weight LLM deployments where data must stay on-premises or in a controlled environment
- Fine-tuning a permissively licensed base model for domain-specific or multilingual tasks
- Math and code-assistance workloads that benefit from the model's reasoning strengths
- Tool-using / function-calling applications and lightweight agents
- Research and benchmarking of sparse Mixture-of-Experts architectures
- Cost-sensitive inference where a dense 70B model would be too slow or expensive
How to access
| Provider | Model ID |
|---|---|
| OpenRouter ↗ | mistralai/mixtral-8x22b-instruct |
| Hugging Face (weights) ↗ | mistralai/Mixtral-8x22B-Instruct-v0.1 |
Mixtral — every version
The full lineage of the Mixtral line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| Mixtral 8x22Bcurrent | 2024-04-10 | — | Apache-2.0 |
| Mixtral 8x7B | 2023-12-09 | — | Apache-2.0 |
FAQ
Is Mixtral 8x22B open source?
Yes. Both the base (Mixtral-8x22B-v0.1) and instruction-tuned weights are released under the Apache 2.0 license, so you can download, fine-tune, self-host, and use the model commercially with no usage restrictions.
How many parameters does Mixtral 8x22B have?
It has 141 billion total parameters but is a sparse Mixture-of-Experts model, so only about 39 billion are active for any given token. That is what lets it run faster and cheaper than a dense model of comparable knowledge capacity.
What is the context window of Mixtral 8x22B?
Mixtral 8x22B supports a 64K-token context window, per Mistral's documentation.
Can I still use Mixtral 8x22B?
Mistral retired the hosted endpoint on its own la Plateforme API on March 30, 2025, but the Apache 2.0 weights remain freely available on Hugging Face and through third-party hosts like OpenRouter, so you can still run it via self-hosting or those providers.