Llama 3.1

Name: Llama 3.1
Author: Meta

Meta's first open frontier-scale LLM — an 8B, 70B, and 405B family with a 128K context window and eight-language support.

Overview

Llama 3.1 is Meta's family of open-weight large language models, released on July 23, 2024, in three sizes: 8B, 70B, and 405B parameters. Its headline release, Llama 3.1 405B, was the first openly available model to rival top proprietary systems of its time — Meta benchmarked it as competitive with GPT-4, GPT-4o, and Claude 3.5 Sonnet on general knowledge, math, tool use, and multilingual tasks. All three sizes are dense decoder-only transformers, pretrained on more than 15 trillion tokens, with the 405B trained on a cluster of over 16,000 NVIDIA H100 GPUs.

Compared with the original Llama 3 (8B and 70B, April 2024), Llama 3.1 expands the context window to 128K tokens across all sizes and adds official multilingual support for eight languages: English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai. The models use Grouped-Query Attention for efficient inference, and the instruction-tuned variants were aligned with supervised fine-tuning and RLHF. The knowledge cutoff is December 2023. Notably, Meta stuck with a dense architecture rather than mixture-of-experts to keep training stable at the 405B scale.

Llama 3.1 is distributed under the Llama 3.1 Community License, a custom commercial license that — for the first time in the Llama line — explicitly permits using model outputs to improve other models. The weights are downloadable from Meta and Hugging Face, and the models are hosted by AWS Bedrock, Azure AI, Google Cloud Vertex AI, Together AI, Fireworks AI, and others. Llama 3.1 has since been superseded: Llama 3.3 70B (December 2024) matched the 405B on many tasks at a fraction of the size, and Llama 4 (April 2025) moved the line to a mixture-of-experts design.

Released	2024-07-23
License	Llama 3.1 Community License (custom commercial)
Weights	Open weights
Parameters	8B, 70B, and 405B (dense)
Context	128K
Max output	Provider-dependent (e.g. ~4K on some hosted endpoints); the 128K window is shared between input and output
Architecture	Standard decoder-only (dense) transformer with Grouped-Query Attention (GQA). Meta deliberately chose a dense architecture over mixture-of-experts for training stability at scale. Pretrained on over 15 trillion tokens; the 405B was trained on more than 16,000 NVIDIA H100 GPUs.
Knowledge cutoff	December 2023
Modalities	Text
Status	Available via cloud providers; superseded by Llama 3.3 and Llama 4. Some hosts (e.g. Oracle OCI) have marked the 405B deprecated.

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$2.75 / 1M tokens per 1M tokens
Output	$6.50 / 1M tokens per 1M tokens

Llama 3.1 is open-weight with no single official price. These are the 405B Instruct median figures across hosted API providers; the 8B and 70B variants are substantially cheaper, and self-hosting incurs only compute cost.

Pricing source ↗

Strengths

First open-weight model to reach frontier-class performance — the 405B scored 87.3% on MMLU, matching GPT-4-class proprietary systems
Fully downloadable weights under a permissive commercial license, enabling self-hosting, fine-tuning, and on-prem deployment
128K-token context window across all three sizes (8B, 70B, 405B)
Strong coding (89.0% HumanEval) and math (73.8% MATH) for the 405B Instruct model
Built-in support for eight languages and strong tool-use / function-calling scores
Broad ecosystem support — hosted by every major cloud and inference provider

Best for

Self-hosted and on-prem deployments where data cannot leave the org's infrastructure
Fine-tuning a strong open base model on domain-specific data
Generating synthetic data and distilling smaller models (explicitly permitted by the license)
Multilingual chat assistants and translation across the eight supported languages
Long-document summarization and retrieval-augmented generation using the 128K window
Coding assistants and tool-using agents built on open weights

How to access

Provider	Model ID
AWS Bedrock ↗	`meta.llama3-1-405b-instruct-v1:0`
Together AI / Fireworks (and other hosts) ↗	`meta-llama/Meta-Llama-3.1-405B-Instruct`
Hugging Face (download weights) ↗	`meta-llama/Llama-3.1-405B-Instruct`

Llama 3 — every version

The full lineage of the Llama 3 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Llama 3.3 70Bcurrent	2024-12-06	—	Open weights
Llama 3.2	2024-09-25	—	Open weights
Llama 3.1	2024-07-23	—	Open weights
Llama 3	2024-04-18	—	Open weights

FAQ

What is Llama 3.1?

Llama 3.1 is Meta's family of open-weight large language models, released July 23, 2024, in 8B, 70B, and 405B parameter sizes. All three are dense decoder-only transformers with a 128K-token context window and support for eight languages. The 405B was the first openly available model to rival GPT-4-class proprietary systems.

Is Llama 3.1 open source?

The weights are downloadable under the Llama 3.1 Community License, a custom commercial license. It is widely called 'open-weight' rather than strictly open-source, since the license has some restrictions, but it does permit commercial use, fine-tuning, and — for the first time in the Llama line — using model outputs to improve other models.

How much does Llama 3.1 cost?

Llama 3.1 is open-weight, so there is no single official price; you can self-host it for only compute cost. Across hosted API providers, the 405B Instruct model runs around $2.75 per 1M input tokens and $6.50 per 1M output tokens (median, per Artificial Analysis). The 8B and 70B variants are significantly cheaper.

What benchmarks did Llama 3.1 405B achieve?

Per Meta's official model card, the 405B Instruct model scored 87.3% on MMLU (5-shot), 50.7% on GPQA, 89.0% on HumanEval, 73.8% on MATH, 88.6% on IFEval, and 91.6% on multilingual MGSM — competitive with GPT-4, GPT-4o, and Claude 3.5 Sonnet at the time of release.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Llama 3 — every version

// FAQ