Mixtral 8x7B

Name: Mixtral 8x7B
Author: Mistral AI

Mistral AI's breakthrough open sparse mixture-of-experts model that matched GPT-3.5 with only 12.9B active parameters.

Overview

Mixtral 8x7B is an open-weight large language model released by Mistral AI on December 9, 2023. It is the company's first mixture-of-experts (MoE) model: a decoder-only transformer whose feed-forward blocks are replaced by 8 separate "expert" networks, with a router picking 2 experts for every token. That sparse design gives the model 46.7B total parameters but only 12.9B active per token, so it runs at roughly the speed and cost of a ~13B dense model while drawing on a far larger pool of knowledge.

At launch Mixtral 8x7B outperformed or matched Llama 2 70B and OpenAI's GPT-3.5 across most standard benchmarks, while delivering about 6x faster inference than Llama 2 70B. It handles a 32k-token context and is natively multilingual across English, French, Italian, German, and Spanish, with strong code-generation ability. An instruction-tuned variant, Mixtral 8x7B Instruct, scored 8.30 on MT-Bench, making it the best open-weight chat model at the time of release.

Mistral shipped Mixtral 8x7B under the permissive Apache 2.0 license, allowing free commercial use, which made it a popular base for fine-tuning and local deployment. Mistral retired the hosted open-mixtral-8x7b endpoint on March 30, 2025 in favor of newer Mistral Small models, but the weights remain freely downloadable on Hugging Face and the model is still served by several third-party inference providers.

Released	2023-12-09
License	Apache 2.0
Weights	Open weights
Parameters	46.7B total parameters, 12.9B active per token (8 experts, 2 routed per token)
Context	32K tokens
Max output	Not officially specified; bounded by the 32K-token context window
Architecture	Decoder-only transformer with a sparse mixture-of-experts (SMoE) feed-forward layer. Each layer holds 8 expert blocks; a router network selects 2 experts per token, so only 12.9B of the 46.7B parameters are active per token. Built on the Mistral 7B architecture (grouped-query attention, sliding-window attention) with a 32k-token context.
Knowledge cutoff	Not officially published by Mistral AI
Modalities	Text
Status	Retired from Mistral's hosted API on 2025-03-30 (replaced by Mistral Small 3.2). The open weights remain freely downloadable under Apache 2.0 and the model is still served by third-party providers.

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.70 / 1M tokens per 1M tokens
Output	$0.70 / 1M tokens per 1M tokens

Mistral's official la Plateforme price for the open-mixtral-8x7b endpoint. This hosted endpoint was retired on 2025-03-30; third-party providers (Together AI, Fireworks, OpenRouter) offer their own rates.

Pricing source ↗

Strengths

Sparse MoE design: ~13B-class inference cost and speed with the knowledge capacity of a much larger model
Apache 2.0 license permits unrestricted commercial use, fine-tuning, and self-hosting
Strong code and math performance for its active-parameter budget (beat Llama 2 70B on GSM8K and HumanEval)
Native multilingual support across English, French, Italian, German, and Spanish
32k-token context window, large for a late-2023 open model
Open weights widely supported by llama.cpp, vLLM, Hugging Face, and major inference providers

Best for

Self-hosted chat assistants and RAG backends where Apache 2.0 licensing matters
Cost-efficient code generation and completion
Multilingual content generation and summarization in EU languages
Fine-tuning a permissively licensed base model for domain-specific tasks
Local and on-prem inference where data cannot leave the environment
Benchmark/baseline for evaluating newer open MoE models

How to access

Provider	Model ID
Mistral AI (la Plateforme) ↗	`open-mixtral-8x7b`
Hugging Face (open weights) ↗	`mistralai/Mixtral-8x7B-Instruct-v0.1`

Mixtral — every version

The full lineage of the Mixtral line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Mixtral 8x22Bcurrent	2024-04-10	—	Apache-2.0
Mixtral 8x7B	2023-12-09	—	Apache-2.0

FAQ

How many parameters does Mixtral 8x7B have?

It has 46.7B total parameters but, because of its sparse mixture-of-experts design, only 12.9B are active per token. Each layer holds 8 expert networks and a router picks 2 per token, giving the model the knowledge capacity of a large model at roughly the inference cost of a ~13B dense model.

Is Mixtral 8x7B free and open source?

Yes. Mixtral 8x7B is released under the Apache 2.0 license, which permits free commercial use, fine-tuning, and self-hosting. The weights are freely downloadable from Hugging Face.

Is Mixtral 8x7B still available?

Mistral retired the hosted open-mixtral-8x7b API endpoint on March 30, 2025, recommending newer Mistral Small models instead. The open weights remain available under Apache 2.0, and several third-party providers (Together AI, Fireworks, OpenRouter) still serve the model.

How does Mixtral 8x7B compare to GPT-3.5 and Llama 2 70B?

At release it outperformed or matched both on most standard benchmarks. It scored 70.6% on MMLU and beat Llama 2 70B on math (74.4% GSM8K) and code (40.2% HumanEval), while running about 6x faster than Llama 2 70B thanks to its sparse MoE architecture.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Mixtral — every version

// FAQ