Overview
Llama 4 Maverick is Meta's flagship model in the Llama 4 herd, announced April 5, 2025. It is a natively multimodal, mixture-of-experts (MoE) model with 17 billion active parameters drawn from 400 billion total parameters across 128 experts, using alternating dense and MoE layers with early-fusion multimodality so text and vision are processed in a single backbone.
It accepts multilingual text and image input and produces multilingual text and code output, with a 1M-token context window. Knowledge cutoff is August 2024. Meta positions it as the best multimodal model in its class, reporting that it beats GPT-4o and Gemini 2.0 Flash on a range of benchmarks while reaching comparable reasoning and coding results to DeepSeek v3.
The model is released under the Llama 4 Community License Agreement and is open-weight, downloadable from Hugging Face and hosted by providers including Together AI and Fireworks AI.
| Released | 2025-04-05 |
|---|---|
| License | Llama 4 Community License Agreement |
| Weights | Open weights |
| Parameters | 400B total · 17B active (128 experts) |
| Context | 1M |
| Architecture | Mixture-of-Experts |
| Knowledge cutoff | 2024-08 |
| Modalities | Text, Vision |
| Status | Available |
Benchmarks
- MMLU Pro80.5%
- GPQA Diamond69.8%
- MMMU73.4%
- MMMU Pro59.6%
- MathVista73.7%
- ChartQA90%
- DocVQA (test)94.4%
- LiveCodeBench43.4%
- MGSM92.3%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Strengths
- Native multimodality (text + image) via early fusion in a single model
- Strong multimodal benchmark results — 73.4 MMMU, 90.0 ChartQA, 94.4 DocVQA
- Efficient MoE design: only 17B of 400B parameters active per token
- Long 1M-token context window for large-document and multi-image tasks
- Broad multilingual support across 12 languages
Best for
- Multimodal assistants combining image understanding with chat
- Long-document and multi-image analysis at up to 1M-token context
- Multilingual text generation and coding
- Self-hosted, open-weight deployments needing frontier multimodal quality
How to access
| Provider | Model ID |
|---|---|
| Together AI ↗ | meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 |
| Fireworks AI ↗ | accounts/fireworks/models/llama4-maverick-instruct-basic |
Llama 4 — every version
The full lineage of the Llama 4 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| Llama 4 Maverickcurrent | 2025-04-05 | 1M | Llama 4 Community |
| Llama 4 Scout | 2025-04-05 | — | Open weights |
| Llama 4 Behemoth | 2025-04 | — | Open weights |
FAQ
Is Llama 4 Maverick open-weight?
Yes. Meta releases the weights under the Llama 4 Community License Agreement, and they can be downloaded from Hugging Face (meta-llama/Llama-4-Maverick-17B-128E-Instruct). It is also hosted by third-party providers such as Together AI and Fireworks AI. The license is a custom community license rather than a standard OSI-approved open-source license.
How many parameters does Llama 4 Maverick have?
Maverick is a mixture-of-experts model with 400 billion total parameters spread across 128 experts, but only 17 billion parameters are active per token. This sparse design gives it large-model quality while keeping inference cost closer to a 17B dense model. It is the higher-expert sibling of Llama 4 Scout (16 experts, 109B total).
What is the context window of Llama 4 Maverick?
Llama 4 Maverick supports a 1 million-token context window per its Hugging Face model card. Its sibling Llama 4 Scout supports an even larger 10 million-token window. Both accept multilingual text and image input and output multilingual text and code, with a knowledge cutoff of August 2024.