Overview
LLaMA (sometimes called Llama 1) is the original family of foundation language models that Meta released on February 24, 2023, and the model that started the entire Llama lineage. It shipped in four dense sizes — 6.7B, 13B, 32.5B and 65.2B parameters (commonly rounded to 7B, 13B, 33B and 65B) — all decoder-only transformers trained only on publicly available data.
LLaMA's central result was efficiency: in its paper, "LLaMA: Open and Efficient Foundation Language Models," Meta showed that LLaMA-13B outperformed OpenAI's much larger GPT-3 (175B) on most benchmarks despite being roughly ten times smaller, while LLaMA-65B was competitive with the best models of the time, Chinchilla-70B and PaLM-540B. The smaller models were trained on about 1.0 trillion tokens and the 33B/65B models on about 1.4 trillion tokens, with a 2,048-token context window.
Unlike later Llama generations, LLaMA was never an open commercial release. The weights were distributed only on a case-by-case basis under a noncommercial license for academic and research use. On March 3, 2023 the weights leaked publicly via a torrent, which seeded a wave of open derivatives (Alpaca, Vicuna, llama.cpp and many others) and effectively launched the open-LLM movement, even though Meta itself only opened the family commercially with Llama 2 in July 2023.
| Released | 2023-02-24 |
|---|---|
| License | Noncommercial research license |
| Weights | Open weights |
| Parameters | 7B / 13B / 33B / 65B (dense) |
| Context | 2K |
| Architecture | Dense decoder-only transformer with RMSNorm, SwiGLU activations and rotary positional embeddings (RoPE). |
| Modalities | Text |
| Status | Deprecated |
Benchmarks
- MMLU (5-shot) — LLaMA-65B63.4%
- MMLU (5-shot) — LLaMA-13B46.9%
- HumanEval pass@1 — LLaMA-65B23.7%
- MBPP pass@1 — LLaMA-65B37.7%
- TriviaQA (0-shot) — LLaMA-65B68.2%
- Natural Questions (0-shot) — LLaMA-65B23.8%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Strengths
- LLaMA-13B outperformed the much larger GPT-3 (175B) on most benchmarks at ~10x fewer parameters
- LLaMA-65B was competitive with Chinchilla-70B and PaLM-540B
- Trained entirely on publicly available data — no proprietary corpora
- Compact enough to run on a single GPU (7B/13B), enabling broad research access
- Seeded the open-weight ecosystem (Alpaca, Vicuna, llama.cpp) after the weights spread
Best for
- Academic and research experimentation on foundation models
- A base for instruction-tuning and fine-tuning research (e.g. Alpaca, Vicuna)
- Studying efficient training and inference at 7B-65B scale
- Local/offline text generation on consumer hardware via community runtimes
- Benchmarking and reproducibility work on openly trained LLMs
FAQ
What sizes did LLaMA (Llama 1) come in?
Four dense sizes: 6.7B, 13B, 32.5B and 65.2B parameters, usually rounded to 7B, 13B, 33B and 65B. All are decoder-only transformers with a 2,048-token context window.
How did LLaMA compare to GPT-3?
Meta's paper reported that LLaMA-13B outperformed GPT-3 (175B) on most benchmarks despite being about ten times smaller, and that LLaMA-65B was competitive with Chinchilla-70B and PaLM-540B.
Was LLaMA open-source or commercially usable?
No. The weights were released only under a noncommercial research license, granted case by case. Commercial use did not arrive until Llama 2 in July 2023. The original weights leaked publicly via torrent on March 3, 2023.
How much data was LLaMA trained on?
The 7B and 13B models were trained on about 1.0 trillion tokens and the 33B and 65B models on about 1.4 trillion tokens, all from publicly available sources.