Overview
Llama 2 is Meta's second-generation Llama large language model line, released on July 18, 2023. It ships in three dense sizes — 7B, 13B and 70B parameters — each available as a pretrained base model and as a dialogue-tuned Llama 2-Chat variant. Unlike the research-only first-generation LLaMA, Llama 2 was the first Llama released with a license that permits commercial use, which is what turned it into the default open-weight foundation for thousands of downstream fine-tunes.
Each Llama 2 model was trained on 2 trillion tokens of publicly available data — about 40% more than the original LLaMA — with a context window of 4,096 tokens (double LLaMA's 2,048) and a data cutoff of September 2022. The 70B model adds Grouped-Query Attention to keep inference affordable at scale, while the smaller 7B and 13B sizes stay easy to run on modest hardware. Meta also trained a 34B model but did not release it.
Llama 2 is text-only and has since been superseded by Llama 3, Llama 3.1/3.2/3.3 and Llama 4, all of which are far stronger and add longer context and multimodality. It remains historically important as the model that mainstreamed openly licensed LLMs, and the weights are still freely downloadable from Meta and Hugging Face under the Llama 2 Community License.
| Released | 2023-07-18 |
|---|---|
| License | Llama 2 Community License Agreement |
| Weights | Open weights |
| Parameters | 7B, 13B, 70B |
| Context | 4K |
| Architecture | Auto-regressive (decoder-only) transformer trained on 2 trillion tokens. The 70B model uses Grouped-Query Attention (GQA) for cheaper inference; the 7B and 13B variants do not. Chat-tuned variants (Llama 2-Chat) are aligned with supervised fine-tuning (SFT) followed by reinforcement learning from human feedback (RLHF). |
| Knowledge cutoff | September 2022 |
| Modalities | Text |
| Status | Superseded |
Benchmarks
- MMLU (5-shot)68.9%
- GSM8K (8-shot)56.8%
- HumanEval (pass@1)29.9%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Strengths
- Openly licensed weights free for both research and commercial use — the first Llama generation to allow this
- Three sizes (7B / 13B / 70B) covering laptop-scale to server-scale deployments
- Dialogue-tuned Llama 2-Chat variants aligned with SFT + RLHF, ready for assistant-style use
- 70B uses Grouped-Query Attention for cheaper, faster inference at large scale
- Massive ecosystem of community fine-tunes, quantizations and tooling built on top of it
Best for
- Self-hosted chat assistants and chatbots using the Llama 2-Chat variants
- Fine-tuning a base model on private or domain-specific data
- On-prem and air-gapped deployments where data cannot leave the network
- Research and benchmarking as an open, reproducible baseline
- Running smaller (7B/13B) models locally on consumer GPUs via quantization
How to access
| Provider | Model ID |
|---|---|
| Hugging Face (meta-llama) ↗ | meta-llama/Llama-2-70b-hf |
| Hugging Face (meta-llama) ↗ | meta-llama/Llama-2-7b-chat-hf |
FAQ
Is Llama 2 free for commercial use?
Yes. Llama 2 is released under the Llama 2 Community License Agreement, which permits both research and commercial use subject to an acceptable use policy. It was the first Llama generation to allow commercial use — the original LLaMA was research-only.
What sizes does Llama 2 come in?
Llama 2 was released in three dense sizes: 7B, 13B and 70B parameters, each available as a pretrained base model and a dialogue-tuned Llama 2-Chat variant. Meta also trained a 34B model but chose not to release it.
What is the context window of Llama 2?
Llama 2 has a 4,096-token context window — double the 2,048 tokens of the original LLaMA. It was trained on 2 trillion tokens of publicly available data with a knowledge cutoff of September 2022.
Is Llama 2 still the latest Llama model?
No. Llama 2 (July 2023) has been superseded by Llama 3, Llama 3.1/3.2/3.3 and Llama 4, which are significantly stronger and add longer context and multimodality. Llama 2 remains available as an open-weight download but new projects should generally use a newer Llama generation.