AI/TLDR

Mixtral 8x7B

Mistral AI's breakthrough open sparse mixture-of-experts model that matched GPT-3.5 with only 12.9B active parameters.

Overview

Mixtral 8x7B is an open-weight large language model released by Mistral AI on December 9, 2023. It is the company's first mixture-of-experts (MoE) model: a decoder-only transformer whose feed-forward blocks are replaced by 8 separate "expert" networks, with a router picking 2 experts for every token. That sparse design gives the model 46.7B total parameters but only 12.9B active per token, so it runs at roughly the speed and cost of a ~13B dense model while drawing on a far larger pool of knowledge.

At launch Mixtral 8x7B outperformed or matched Llama 2 70B and OpenAI's GPT-3.5 across most standard benchmarks, while delivering about 6x faster inference than Llama 2 70B. It handles a 32k-token context and is natively multilingual across English, French, Italian, German, and Spanish, with strong code-generation ability. An instruction-tuned variant, Mixtral 8x7B Instruct, scored 8.30 on MT-Bench, making it the best open-weight chat model at the time of release.

Mistral shipped Mixtral 8x7B under the permissive Apache 2.0 license, allowing free commercial use, which made it a popular base for fine-tuning and local deployment. Mistral retired the hosted open-mixtral-8x7b endpoint on March 30, 2025 in favor of newer Mistral Small models, but the weights remain freely downloadable on Hugging Face and the model is still served by several third-party inference providers.

Released2023-12-09
LicenseApache 2.0
WeightsOpen weights
Parameters46.7B total parameters, 12.9B active per token (8 experts, 2 routed per token)
Context32K tokens
Max outputNot officially specified; bounded by the 32K-token context window
ArchitectureDecoder-only transformer with a sparse mixture-of-experts (SMoE) feed-forward layer. Each layer holds 8 expert blocks; a router network selects 2 experts per token, so only 12.9B of the 46.7B parameters are active per token. Built on the Mistral 7B architecture (grouped-query attention, sliding-window attention) with a 32k-token context.
Knowledge cutoffNot officially published by Mistral AI
ModalitiesText
StatusRetired from Mistral's hosted API on 2025-03-30 (replaced by Mistral Small 3.2). The open weights remain freely downloadable under Apache 2.0 and the model is still served by third-party providers.

Benchmarks

  1. MMLU (5-shot)70.6%
  2. HellaSwag (10-shot)84.4%
  3. WinoGrande (5-shot)77.2%
  4. ARC Challenge (25-shot)59.7%
  5. GSM8K (maj@8)74.4%
  6. MATH (4-shot, maj@4)28.4%
  7. HumanEval (0-shot)40.2%
  8. MBPP (3-shot)60.7%
  9. MT-Bench (Instruct variant)8.3score

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.70 / 1M tokens per 1M tokens
Output$0.70 / 1M tokens per 1M tokens

Mistral's official la Plateforme price for the open-mixtral-8x7b endpoint. This hosted endpoint was retired on 2025-03-30; third-party providers (Together AI, Fireworks, OpenRouter) offer their own rates.

Pricing source ↗

Strengths

  • Sparse MoE design: ~13B-class inference cost and speed with the knowledge capacity of a much larger model
  • Apache 2.0 license permits unrestricted commercial use, fine-tuning, and self-hosting
  • Strong code and math performance for its active-parameter budget (beat Llama 2 70B on GSM8K and HumanEval)
  • Native multilingual support across English, French, Italian, German, and Spanish
  • 32k-token context window, large for a late-2023 open model
  • Open weights widely supported by llama.cpp, vLLM, Hugging Face, and major inference providers

Best for

  • Self-hosted chat assistants and RAG backends where Apache 2.0 licensing matters
  • Cost-efficient code generation and completion
  • Multilingual content generation and summarization in EU languages
  • Fine-tuning a permissively licensed base model for domain-specific tasks
  • Local and on-prem inference where data cannot leave the environment
  • Benchmark/baseline for evaluating newer open MoE models

How to access

ProviderModel ID
Mistral AI (la Plateforme) ↗open-mixtral-8x7b
Hugging Face (open weights) ↗mistralai/Mixtral-8x7B-Instruct-v0.1

Mixtral — every version

The full lineage of the Mixtral line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Mixtral 8x22Bcurrent2024-04-10Apache-2.0
Mixtral 8x7B2023-12-09Apache-2.0

FAQ

How many parameters does Mixtral 8x7B have?

It has 46.7B total parameters but, because of its sparse mixture-of-experts design, only 12.9B are active per token. Each layer holds 8 expert networks and a router picks 2 per token, giving the model the knowledge capacity of a large model at roughly the inference cost of a ~13B dense model.

Is Mixtral 8x7B free and open source?

Yes. Mixtral 8x7B is released under the Apache 2.0 license, which permits free commercial use, fine-tuning, and self-hosting. The weights are freely downloadable from Hugging Face.

Is Mixtral 8x7B still available?

Mistral retired the hosted open-mixtral-8x7b API endpoint on March 30, 2025, recommending newer Mistral Small models instead. The open weights remain available under Apache 2.0, and several third-party providers (Together AI, Fireworks, OpenRouter) still serve the model.

How does Mixtral 8x7B compare to GPT-3.5 and Llama 2 70B?

At release it outperformed or matched both on most standard benchmarks. It scored 70.6% on MMLU and beat Llama 2 70B on math (74.4% GSM8K) and code (40.2% HumanEval), while running about 6x faster than Llama 2 70B thanks to its sparse MoE architecture.