AI/TLDR

Mixtral 8x22B

Mistral AI's open-weight 141B sparse Mixture-of-Experts model — only 39B active, Apache 2.0.

Overview

Mixtral 8x22B is an open-weight large language model from Mistral AI, released as a base model via magnet link on April 10, 2024 and formally announced on April 17, 2024 under the title "Cheaper, Better, Faster, Stronger." It is a sparse Mixture-of-Experts (SMoE) model with 141 billion total parameters, of which only about 39 billion are active for any given token. That sparse routing is the whole point of the design: it gives Mixtral 8x22B the knowledge capacity of a very large model while running at roughly the speed and cost of a much smaller dense one.

Mistral released Mixtral 8x22B under the Apache 2.0 license — one of the most permissive open-source licenses — so the weights for both the base (Mixtral-8x22B-v0.1) and instruction-tuned (Mixtral-8x22B-Instruct-v0.1) checkpoints can be downloaded, fine-tuned, and self-hosted with no usage restrictions. The model has a 64K-token context window, is natively capable of function calling, and is fluent in English, French, Italian, German, and Spanish.

At launch Mixtral 8x22B was positioned as one of the strongest open-weight models available, with particular strength in reasoning, mathematics, and coding. Mistral retired it from its hosted la Plateforme API on March 30, 2025 in favor of newer models such as Mistral Small 3.2, but the Apache 2.0 weights remain freely available on Hugging Face and through third-party hosts, so it stays a usable open model for self-hosting and research.

Released2024-04-10
LicenseApache 2.0
WeightsOpen weights
Parameters141B total, 39B active per token
Context64K
Max output4K (la Plateforme; ~8K on some providers)
ArchitectureSparse Mixture-of-Experts (SMoE) transformer — 8 experts, 2 routed per token; ~39B of 141B parameters active per forward pass; BF16 weights
Knowledge cutoffNot publicly disclosed by Mistral AI
ModalitiesText
StatusRetired from Mistral's la Plateforme API (March 30, 2025); open weights remain available on Hugging Face under Apache 2.0

Benchmarks

  1. MMLU77.7%
  2. GSM8K (maj@8, instruct)90.8%
  3. Math (maj@4, instruct)44.6%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$2.00 per 1M tokens per 1M tokens (Mixtral 8x22B Instruct, third-party host)
Output$6.00 per 1M tokens per 1M tokens (Mixtral 8x22B Instruct, third-party host)

Mistral's own la Plateforme retired the hosted endpoint on 2025-03-30; rate shown is from OpenRouter for the open-weight instruct model. Self-hosting the Apache 2.0 weights has no per-token cost.

Pricing source ↗

Strengths

  • Cost/performance efficiency: sparse MoE activates only ~39B of 141B parameters per token, so it runs faster and cheaper than a dense 70B+ model while retaining large-model knowledge
  • Fully open under Apache 2.0 — base and instruct weights are downloadable, fine-tunable, and self-hostable with no commercial restrictions
  • Strong math and coding for its era (GSM8K maj@8 90.8%, Math maj@4 44.6% on the instruct version per Mistral)
  • Native function calling, useful for tool-augmented and agentic application development
  • Multilingual across English, French, Italian, German, and Spanish
  • 64K-token context window for working over long documents

Best for

  • Self-hosted open-weight LLM deployments where data must stay on-premises or in a controlled environment
  • Fine-tuning a permissively licensed base model for domain-specific or multilingual tasks
  • Math and code-assistance workloads that benefit from the model's reasoning strengths
  • Tool-using / function-calling applications and lightweight agents
  • Research and benchmarking of sparse Mixture-of-Experts architectures
  • Cost-sensitive inference where a dense 70B model would be too slow or expensive

How to access

ProviderModel ID
OpenRouter ↗mistralai/mixtral-8x22b-instruct
Hugging Face (weights) ↗mistralai/Mixtral-8x22B-Instruct-v0.1

Mixtral — every version

The full lineage of the Mixtral line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Mixtral 8x22Bcurrent2024-04-10Apache-2.0
Mixtral 8x7B2023-12-09Apache-2.0

FAQ

Is Mixtral 8x22B open source?

Yes. Both the base (Mixtral-8x22B-v0.1) and instruction-tuned weights are released under the Apache 2.0 license, so you can download, fine-tune, self-host, and use the model commercially with no usage restrictions.

How many parameters does Mixtral 8x22B have?

It has 141 billion total parameters but is a sparse Mixture-of-Experts model, so only about 39 billion are active for any given token. That is what lets it run faster and cheaper than a dense model of comparable knowledge capacity.

What is the context window of Mixtral 8x22B?

Mixtral 8x22B supports a 64K-token context window, per Mistral's documentation.

Can I still use Mixtral 8x22B?

Mistral retired the hosted endpoint on its own la Plateforme API on March 30, 2025, but the Apache 2.0 weights remain freely available on Hugging Face and through third-party hosts like OpenRouter, so you can still run it via self-hosting or those providers.