Mixtral 8x22B

Name: Mixtral 8x22B
Author: Mistral AI

Mistral AI's open-weight 141B sparse Mixture-of-Experts model — only 39B active, Apache 2.0.

Overview

Mixtral 8x22B is an open-weight large language model from Mistral AI, released as a base model via magnet link on April 10, 2024 and formally announced on April 17, 2024 under the title "Cheaper, Better, Faster, Stronger." It is a sparse Mixture-of-Experts (SMoE) model with 141 billion total parameters, of which only about 39 billion are active for any given token. That sparse routing is the whole point of the design: it gives Mixtral 8x22B the knowledge capacity of a very large model while running at roughly the speed and cost of a much smaller dense one.

Mistral released Mixtral 8x22B under the Apache 2.0 license — one of the most permissive open-source licenses — so the weights for both the base (Mixtral-8x22B-v0.1) and instruction-tuned (Mixtral-8x22B-Instruct-v0.1) checkpoints can be downloaded, fine-tuned, and self-hosted with no usage restrictions. The model has a 64K-token context window, is natively capable of function calling, and is fluent in English, French, Italian, German, and Spanish.

At launch Mixtral 8x22B was positioned as one of the strongest open-weight models available, with particular strength in reasoning, mathematics, and coding. Mistral retired it from its hosted la Plateforme API on March 30, 2025 in favor of newer models such as Mistral Small 3.2, but the Apache 2.0 weights remain freely available on Hugging Face and through third-party hosts, so it stays a usable open model for self-hosting and research.

Released	2024-04-10
License	Apache 2.0
Weights	Open weights
Parameters	141B total, 39B active per token
Context	64K
Max output	4K (la Plateforme; ~8K on some providers)
Architecture	Sparse Mixture-of-Experts (SMoE) transformer — 8 experts, 2 routed per token; ~39B of 141B parameters active per forward pass; BF16 weights
Knowledge cutoff	Not publicly disclosed by Mistral AI
Modalities	Text
Status	Retired from Mistral's la Plateforme API (March 30, 2025); open weights remain available on Hugging Face under Apache 2.0

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$2.00 per 1M tokens per 1M tokens (Mixtral 8x22B Instruct, third-party host)
Output	$6.00 per 1M tokens per 1M tokens (Mixtral 8x22B Instruct, third-party host)

Mistral's own la Plateforme retired the hosted endpoint on 2025-03-30; rate shown is from OpenRouter for the open-weight instruct model. Self-hosting the Apache 2.0 weights has no per-token cost.

Pricing source ↗

Strengths

Cost/performance efficiency: sparse MoE activates only ~39B of 141B parameters per token, so it runs faster and cheaper than a dense 70B+ model while retaining large-model knowledge
Fully open under Apache 2.0 — base and instruct weights are downloadable, fine-tunable, and self-hostable with no commercial restrictions
Strong math and coding for its era (GSM8K maj@8 90.8%, Math maj@4 44.6% on the instruct version per Mistral)
Native function calling, useful for tool-augmented and agentic application development
Multilingual across English, French, Italian, German, and Spanish
64K-token context window for working over long documents

Best for

Self-hosted open-weight LLM deployments where data must stay on-premises or in a controlled environment
Fine-tuning a permissively licensed base model for domain-specific or multilingual tasks
Math and code-assistance workloads that benefit from the model's reasoning strengths
Tool-using / function-calling applications and lightweight agents
Research and benchmarking of sparse Mixture-of-Experts architectures
Cost-sensitive inference where a dense 70B model would be too slow or expensive

How to access

Provider	Model ID
OpenRouter ↗	`mistralai/mixtral-8x22b-instruct`
Hugging Face (weights) ↗	`mistralai/Mixtral-8x22B-Instruct-v0.1`

Mixtral — every version

The full lineage of the Mixtral line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Mixtral 8x22Bcurrent	2024-04-10	—	Apache-2.0
Mixtral 8x7B	2023-12-09	—	Apache-2.0

FAQ

Is Mixtral 8x22B open source?

Yes. Both the base (Mixtral-8x22B-v0.1) and instruction-tuned weights are released under the Apache 2.0 license, so you can download, fine-tune, self-host, and use the model commercially with no usage restrictions.

How many parameters does Mixtral 8x22B have?

It has 141 billion total parameters but is a sparse Mixture-of-Experts model, so only about 39 billion are active for any given token. That is what lets it run faster and cheaper than a dense model of comparable knowledge capacity.

What is the context window of Mixtral 8x22B?

Mixtral 8x22B supports a 64K-token context window, per Mistral's documentation.

Can I still use Mixtral 8x22B?

Mistral retired the hosted endpoint on its own la Plateforme API on March 30, 2025, but the Apache 2.0 weights remain freely available on Hugging Face and through third-party hosts like OpenRouter, so you can still run it via self-hosting or those providers.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Mixtral — every version

// FAQ