AI/TLDR

Llama 3.1

Meta's first open frontier-scale LLM — an 8B, 70B, and 405B family with a 128K context window and eight-language support.

Overview

Llama 3.1 is Meta's family of open-weight large language models, released on July 23, 2024, in three sizes: 8B, 70B, and 405B parameters. Its headline release, Llama 3.1 405B, was the first openly available model to rival top proprietary systems of its time — Meta benchmarked it as competitive with GPT-4, GPT-4o, and Claude 3.5 Sonnet on general knowledge, math, tool use, and multilingual tasks. All three sizes are dense decoder-only transformers, pretrained on more than 15 trillion tokens, with the 405B trained on a cluster of over 16,000 NVIDIA H100 GPUs.

Compared with the original Llama 3 (8B and 70B, April 2024), Llama 3.1 expands the context window to 128K tokens across all sizes and adds official multilingual support for eight languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. The models use Grouped-Query Attention for efficient inference, and the instruction-tuned variants were aligned with supervised fine-tuning and RLHF. The knowledge cutoff is December 2023. Notably, Meta stuck with a dense architecture rather than mixture-of-experts to keep training stable at the 405B scale.

Llama 3.1 is distributed under the Llama 3.1 Community License, a custom commercial license that — for the first time in the Llama line — explicitly permits using model outputs to improve other models. The weights are downloadable from Meta and Hugging Face, and the models are hosted by AWS Bedrock, Azure AI, Google Cloud Vertex AI, Together AI, Fireworks AI, and others. Llama 3.1 has since been superseded: Llama 3.3 70B (December 2024) matched the 405B on many tasks at a fraction of the size, and Llama 4 (April 2025) moved the line to a mixture-of-experts design.

Released2024-07-23
LicenseLlama 3.1 Community License (custom commercial)
WeightsOpen weights
Parameters8B, 70B, and 405B (dense)
Context128K
Max outputProvider-dependent (e.g. ~4K on some hosted endpoints); the 128K window is shared between input and output
ArchitectureStandard decoder-only (dense) transformer with Grouped-Query Attention (GQA). Meta deliberately chose a dense architecture over mixture-of-experts for training stability at scale. Pretrained on over 15 trillion tokens; the 405B was trained on more than 16,000 NVIDIA H100 GPUs.
Knowledge cutoffDecember 2023
ModalitiesText
StatusAvailable via cloud providers; superseded by Llama 3.3 and Llama 4. Some hosts (e.g. Oracle OCI) have marked the 405B deprecated.

Benchmarks

  1. MMLU (5-shot)87.3%
  2. MMLU-Pro (CoT)73.3%
  3. GPQA (0-shot)50.7%
  4. HumanEval (0-shot)89%
  5. MATH (CoT)73.8%
  6. IFEval88.6%
  7. BFCL (tool use)88.5%
  8. Multilingual MGSM (CoT)91.6%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$2.75 / 1M tokens per 1M tokens
Output$6.50 / 1M tokens per 1M tokens

Llama 3.1 is open-weight with no single official price. These are the 405B Instruct median figures across hosted API providers; the 8B and 70B variants are substantially cheaper, and self-hosting incurs only compute cost.

Pricing source ↗

Strengths

  • First open-weight model to reach frontier-class performance — the 405B scored 87.3% on MMLU, matching GPT-4-class proprietary systems
  • Fully downloadable weights under a permissive commercial license, enabling self-hosting, fine-tuning, and on-prem deployment
  • 128K-token context window across all three sizes (8B, 70B, 405B)
  • Strong coding (89.0% HumanEval) and math (73.8% MATH) for the 405B Instruct model
  • Built-in support for eight languages and strong tool-use / function-calling scores
  • Broad ecosystem support — hosted by every major cloud and inference provider

Best for

  • Self-hosted and on-prem deployments where data cannot leave the org's infrastructure
  • Fine-tuning a strong open base model on domain-specific data
  • Generating synthetic data and distilling smaller models (explicitly permitted by the license)
  • Multilingual chat assistants and translation across the eight supported languages
  • Long-document summarization and retrieval-augmented generation using the 128K window
  • Coding assistants and tool-using agents built on open weights

How to access

ProviderModel ID
AWS Bedrock ↗meta.llama3-1-405b-instruct-v1:0
Together AI / Fireworks (and other hosts) ↗meta-llama/Meta-Llama-3.1-405B-Instruct
Hugging Face (download weights) ↗meta-llama/Llama-3.1-405B-Instruct

Llama 3 — every version

The full lineage of the Llama 3 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Llama 3.3 70Bcurrent2024-12-06Open weights
Llama 3.22024-09-25Open weights
Llama 3.12024-07-23Open weights
Llama 32024-04-18Open weights

FAQ

What is Llama 3.1?

Llama 3.1 is Meta's family of open-weight large language models, released July 23, 2024, in 8B, 70B, and 405B parameter sizes. All three are dense decoder-only transformers with a 128K-token context window and support for eight languages. The 405B was the first openly available model to rival GPT-4-class proprietary systems.

Is Llama 3.1 open source?

The weights are downloadable under the Llama 3.1 Community License, a custom commercial license. It is widely called 'open-weight' rather than strictly open-source, since the license has some restrictions, but it does permit commercial use, fine-tuning, and — for the first time in the Llama line — using model outputs to improve other models.

How much does Llama 3.1 cost?

Llama 3.1 is open-weight, so there is no single official price; you can self-host it for only compute cost. Across hosted API providers, the 405B Instruct model runs around $2.75 per 1M input tokens and $6.50 per 1M output tokens (median, per Artificial Analysis). The 8B and 70B variants are significantly cheaper.

What benchmarks did Llama 3.1 405B achieve?

Per Meta's official model card, the 405B Instruct model scored 87.3% on MMLU (5-shot), 50.7% on GPQA, 89.0% on HumanEval, 73.8% on MATH, 88.6% on IFEval, and 91.6% on multilingual MGSM — competitive with GPT-4, GPT-4o, and Claude 3.5 Sonnet at the time of release.