AI/TLDR

Llama 4 Maverick

Meta's flagship natively-multimodal MoE model with 400B total / 17B active parameters

Overview

Llama 4 Maverick is Meta's flagship model in the Llama 4 herd, announced April 5, 2025. It is a natively multimodal, mixture-of-experts (MoE) model with 17 billion active parameters drawn from 400 billion total parameters across 128 experts, using alternating dense and MoE layers with early-fusion multimodality so text and vision are processed in a single backbone.

It accepts multilingual text and image input and produces multilingual text and code output, with a 1M-token context window. Knowledge cutoff is August 2024. Meta positions it as the best multimodal model in its class, reporting that it beats GPT-4o and Gemini 2.0 Flash on a range of benchmarks while reaching comparable reasoning and coding results to DeepSeek v3.

The model is released under the Llama 4 Community License Agreement and is open-weight, downloadable from Hugging Face and hosted by providers including Together AI and Fireworks AI.

Released2025-04-05
LicenseLlama 4 Community License Agreement
WeightsOpen weights
Parameters400B total · 17B active (128 experts)
Context1M
ArchitectureMixture-of-Experts
Knowledge cutoff2024-08
ModalitiesText, Vision
StatusAvailable

Benchmarks

  1. MMLU Pro80.5%
  2. GPQA Diamond69.8%
  3. MMMU73.4%
  4. MMMU Pro59.6%
  5. MathVista73.7%
  6. ChartQA90%
  7. DocVQA (test)94.4%
  8. LiveCodeBench43.4%
  9. MGSM92.3%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Strengths

  • Native multimodality (text + image) via early fusion in a single model
  • Strong multimodal benchmark results — 73.4 MMMU, 90.0 ChartQA, 94.4 DocVQA
  • Efficient MoE design: only 17B of 400B parameters active per token
  • Long 1M-token context window for large-document and multi-image tasks
  • Broad multilingual support across 12 languages

Best for

  • Multimodal assistants combining image understanding with chat
  • Long-document and multi-image analysis at up to 1M-token context
  • Multilingual text generation and coding
  • Self-hosted, open-weight deployments needing frontier multimodal quality

How to access

ProviderModel ID
Together AI ↗meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8
Fireworks AI ↗accounts/fireworks/models/llama4-maverick-instruct-basic

Llama 4 — every version

The full lineage of the Llama 4 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Llama 4 Maverickcurrent2025-04-051MLlama 4 Community
Llama 4 Scout2025-04-05Open weights
Llama 4 Behemoth2025-04Open weights

FAQ

Is Llama 4 Maverick open-weight?

Yes. Meta releases the weights under the Llama 4 Community License Agreement, and they can be downloaded from Hugging Face (meta-llama/Llama-4-Maverick-17B-128E-Instruct). It is also hosted by third-party providers such as Together AI and Fireworks AI. The license is a custom community license rather than a standard OSI-approved open-source license.

How many parameters does Llama 4 Maverick have?

Maverick is a mixture-of-experts model with 400 billion total parameters spread across 128 experts, but only 17 billion parameters are active per token. This sparse design gives it large-model quality while keeping inference cost closer to a 17B dense model. It is the higher-expert sibling of Llama 4 Scout (16 experts, 109B total).

What is the context window of Llama 4 Maverick?

Llama 4 Maverick supports a 1 million-token context window per its Hugging Face model card. Its sibling Llama 4 Scout supports an even larger 10 million-token window. Both accept multilingual text and image input and output multilingual text and code, with a knowledge cutoff of August 2024.