Llama 3

Name: Llama 3
Author: Meta

Meta's first Llama 3 release: open-weight 8B and 70B text models trained on 15 trillion tokens, with an 8K context window.

Overview

Llama 3 is Meta's third-generation open-weight large language model family, launched on April 18, 2024 in two dense sizes — Llama 3 8B and Llama 3 70B — each available as a pretrained base model and an instruction-tuned (Instruct) chat model. At release it was Meta's most capable openly available model and set a new bar for the small/mid open-weight tier, outperforming peers like Mistral 7B, Gemma and the earlier Llama 2 70B on standard reasoning and coding benchmarks.

Both Llama 3 sizes are text-only and use a decoder-only transformer with Grouped-Query Attention and a larger 128K-token tokenizer, trained on more than 15 trillion tokens of publicly available text. The context window is 8,192 tokens — modest by later standards but a doubling of Llama 2's 4K. Meta published the weights under the permissive Meta Llama 3 Community License, letting developers run, fine-tune and self-host the models, which is why Llama 3 became a default base for countless fine-tunes and on-prem deployments.

Llama 3 was a stepping stone: Meta explicitly framed it as an early release and shipped the much larger, longer-context Llama 3.1 (including the frontier-scale 405B) just three months later on July 23, 2024, followed by Llama 3.2 and Llama 3.3. For new projects the later Llama 3.x models are the recommended choice, but the original Llama 3 8B and 70B weights remain freely downloadable and historically important.

Released	2024-04-18
License	Meta Llama 3 Community License (open weights; custom, not OSI-approved — free for most commercial use, with extra terms for products exceeding 700M monthly active users)
Weights	Open weights
Parameters	Two dense sizes: 8B and 70B parameters
Context	8,192 tokens (8K)
Architecture	Decoder-only (auto-regressive) transformer using Grouped-Query Attention (GQA) on both the 8B and 70B sizes, with a 128K-token tokenizer vocabulary. Pretrained on over 15 trillion tokens from publicly available sources (about 7x the Llama 2 corpus and 4x the code); instruct models tuned with SFT, rejection sampling, PPO and DPO.
Knowledge cutoff	March 2023 (8B); December 2023 (70B)
Modalities	Text
Status	Superseded. Llama 3 (the original 8B and 70B from April 2024) was replaced by Llama 3.1 on 2024-07-23 and later point releases (3.2, 3.3). The weights remain freely downloadable on Hugging Face, but most hosted API providers have retired the original endpoints in favor of the newer Llama 3.x models.

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.65 / 1M tokens (Llama 3 70B Instruct, cross-provider median) / 1M tokens
Output	$2.75 / 1M tokens (Llama 3 70B Instruct, cross-provider median) / 1M tokens

Llama 3 has open weights and no first-party Meta API, so price depends on the hosting provider; these are cross-provider median rates for the original Llama 3 70B Instruct. The smaller 8B was typically served around $0.10–0.20 / 1M tokens. Many providers have since retired the original Llama 3 endpoints in favor of Llama 3.1/3.3.

Pricing source ↗

Strengths

Strong reasoning and coding for its size — Llama 3 70B Instruct scores 82.0 on MMLU and 81.7 on HumanEval, competitive with much larger closed models of its era
Fully open weights under a permissive community license: free to download, fine-tune and self-host
Efficient 8B model that runs on a single consumer GPU while still scoring 68.4 MMLU and 62.2 HumanEval
Grouped-Query Attention on both sizes for faster, cheaper inference
Huge ecosystem — became one of the most fine-tuned and widely hosted open models, supported across virtually every inference framework and cloud

Best for

Self-hosted chat assistants and internal tools where data must stay on-premises
Fine-tuning a base model on domain-specific data for classification, extraction or instruction following
Cost-sensitive, high-volume text generation and summarization via the lightweight 8B model
Code generation and assistance using the strong-for-its-size 70B model
Research, benchmarking and as an open baseline for building on top of frontier-quality open weights

How to access

Provider	Model ID
Hugging Face (download weights) ↗	`meta-llama/Meta-Llama-3-70B-Instruct`
Together AI (hosted) ↗	`meta-llama/Llama-3-70b-chat-hf`

Llama 3 — every version

The full lineage of the Llama 3 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Llama 3.3 70Bcurrent	2024-12-06	—	Open weights
Llama 3.2	2024-09-25	—	Open weights
Llama 3.1	2024-07-23	—	Open weights
Llama 3	2024-04-18	—	Open weights

FAQ

When was Llama 3 released and by whom?

Meta released Llama 3 on April 18, 2024, in two open-weight sizes: 8B and 70B parameters, each with a pretrained base and an instruction-tuned chat variant.

What is Llama 3's context window?

Both Llama 3 sizes support an 8,192-token (8K) context window — double Llama 2's 4K. The longer 128K context arrived later with Llama 3.1.

Is Llama 3 free and open source?

The weights are freely downloadable under the Meta Llama 3 Community License, which permits most commercial use. It is 'open weights' rather than strictly OSI open source: the license adds restrictions (notably extra terms for products with over 700 million monthly active users).

Should I use Llama 3 or Llama 3.1?

For new projects, Llama 3.1 (released July 23, 2024) or a later point release is recommended — they add a 128K context window, more languages, and the frontier-scale 405B model. The original Llama 3 8B and 70B weights remain available but are superseded.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Llama 3 — every version

// FAQ