Overview
Llama 3 is Meta's third-generation open-weight large language model family, launched on April 18, 2024 in two dense sizes — Llama 3 8B and Llama 3 70B — each available as a pretrained base model and an instruction-tuned (Instruct) chat model. At release it was Meta's most capable openly available model and set a new bar for the small/mid open-weight tier, outperforming peers like Mistral 7B, Gemma and the earlier Llama 2 70B on standard reasoning and coding benchmarks.
Both Llama 3 sizes are text-only and use a decoder-only transformer with Grouped-Query Attention and a larger 128K-token tokenizer, trained on more than 15 trillion tokens of publicly available text. The context window is 8,192 tokens — modest by later standards but a doubling of Llama 2's 4K. Meta published the weights under the permissive Meta Llama 3 Community License, letting developers run, fine-tune and self-host the models, which is why Llama 3 became a default base for countless fine-tunes and on-prem deployments.
Llama 3 was a stepping stone: Meta explicitly framed it as an early release and shipped the much larger, longer-context Llama 3.1 (including the frontier-scale 405B) just three months later on July 23, 2024, followed by Llama 3.2 and Llama 3.3. For new projects the later Llama 3.x models are the recommended choice, but the original Llama 3 8B and 70B weights remain freely downloadable and historically important.
| Released | 2024-04-18 |
|---|---|
| License | Meta Llama 3 Community License (open weights; custom, not OSI-approved — free for most commercial use, with extra terms for products exceeding 700M monthly active users) |
| Weights | Open weights |
| Parameters | Two dense sizes: 8B and 70B parameters |
| Context | 8,192 tokens (8K) |
| Architecture | Decoder-only (auto-regressive) transformer using Grouped-Query Attention (GQA) on both the 8B and 70B sizes, with a 128K-token tokenizer vocabulary. Pretrained on over 15 trillion tokens from publicly available sources (about 7x the Llama 2 corpus and 4x the code); instruct models tuned with SFT, rejection sampling, PPO and DPO. |
| Knowledge cutoff | March 2023 (8B); December 2023 (70B) |
| Modalities | Text |
| Status | Superseded. Llama 3 (the original 8B and 70B from April 2024) was replaced by Llama 3.1 on 2024-07-23 and later point releases (3.2, 3.3). The weights remain freely downloadable on Hugging Face, but most hosted API providers have retired the original endpoints in favor of the newer Llama 3.x models. |
Benchmarks
- MMLU (5-shot) — 70B Instruct82%
- HumanEval (0-shot) — 70B Instruct81.7%
- GSM-8K (8-shot, CoT) — 70B Instruct93%
- MATH (4-shot, CoT) — 70B Instruct50.4%
- GPQA (0-shot) — 70B Instruct39.5%
- MMLU (5-shot) — 8B Instruct68.4%
- HumanEval (0-shot) — 8B Instruct62.2%
- GSM-8K (8-shot, CoT) — 8B Instruct79.6%
- MATH (4-shot, CoT) — 8B Instruct30%
- GPQA (0-shot) — 8B Instruct34.2%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.65 / 1M tokens (Llama 3 70B Instruct, cross-provider median) / 1M tokens |
|---|---|
| Output | $2.75 / 1M tokens (Llama 3 70B Instruct, cross-provider median) / 1M tokens |
Llama 3 has open weights and no first-party Meta API, so price depends on the hosting provider; these are cross-provider median rates for the original Llama 3 70B Instruct. The smaller 8B was typically served around $0.10–0.20 / 1M tokens. Many providers have since retired the original Llama 3 endpoints in favor of Llama 3.1/3.3.
Strengths
- Strong reasoning and coding for its size — Llama 3 70B Instruct scores 82.0 on MMLU and 81.7 on HumanEval, competitive with much larger closed models of its era
- Fully open weights under a permissive community license: free to download, fine-tune and self-host
- Efficient 8B model that runs on a single consumer GPU while still scoring 68.4 MMLU and 62.2 HumanEval
- Grouped-Query Attention on both sizes for faster, cheaper inference
- Huge ecosystem — became one of the most fine-tuned and widely hosted open models, supported across virtually every inference framework and cloud
Best for
- Self-hosted chat assistants and internal tools where data must stay on-premises
- Fine-tuning a base model on domain-specific data for classification, extraction or instruction following
- Cost-sensitive, high-volume text generation and summarization via the lightweight 8B model
- Code generation and assistance using the strong-for-its-size 70B model
- Research, benchmarking and as an open baseline for building on top of frontier-quality open weights
How to access
| Provider | Model ID |
|---|---|
| Hugging Face (download weights) ↗ | meta-llama/Meta-Llama-3-70B-Instruct |
| Together AI (hosted) ↗ | meta-llama/Llama-3-70b-chat-hf |
Llama 3 — every version
The full lineage of the Llama 3 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| Llama 3.3 70Bcurrent | 2024-12-06 | — | Open weights |
| Llama 3.2 | 2024-09-25 | — | Open weights |
| Llama 3.1 | 2024-07-23 | — | Open weights |
| Llama 3 | 2024-04-18 | — | Open weights |
FAQ
When was Llama 3 released and by whom?
Meta released Llama 3 on April 18, 2024, in two open-weight sizes: 8B and 70B parameters, each with a pretrained base and an instruction-tuned chat variant.
What is Llama 3's context window?
Both Llama 3 sizes support an 8,192-token (8K) context window — double Llama 2's 4K. The longer 128K context arrived later with Llama 3.1.
Is Llama 3 free and open source?
The weights are freely downloadable under the Meta Llama 3 Community License, which permits most commercial use. It is 'open weights' rather than strictly OSI open source: the license adds restrictions (notably extra terms for products with over 700 million monthly active users).
Should I use Llama 3 or Llama 3.1?
For new projects, Llama 3.1 (released July 23, 2024) or a later point release is recommended — they add a 128K context window, more languages, and the frontier-scale 405B model. The original Llama 3 8B and 70B weights remain available but are superseded.