LLaMA

Name: LLaMA
Author: Meta

Meta's original research-only foundation models (7B-65B) that started the open Llama family.

Overview

LLaMA (sometimes called Llama 1) is the original family of foundation language models that Meta released on February 24, 2023, and the model that started the entire Llama lineage. It shipped in four dense sizes — 6.7B, 13B, 32.5B and 65.2B parameters (commonly rounded to 7B, 13B, 33B and 65B) — all decoder-only transformers trained only on publicly available data.

LLaMA's central result was efficiency: in its paper, "LLaMA: Open and Efficient Foundation Language Models," Meta showed that LLaMA-13B outperformed OpenAI's much larger GPT-3 (175B) on most benchmarks despite being roughly ten times smaller, while LLaMA-65B was competitive with the best models of the time, Chinchilla-70B and PaLM-540B. The smaller models were trained on about 1.0 trillion tokens and the 33B/65B models on about 1.4 trillion tokens, with a 2,048-token context window.

Unlike later Llama generations, LLaMA was never an open commercial release. The weights were distributed only on a case-by-case basis under a noncommercial license for academic and research use. On March 3, 2023 the weights leaked publicly via a torrent, which seeded a wave of open derivatives (Alpaca, Vicuna, llama.cpp and many others) and effectively launched the open-LLM movement, even though Meta itself only opened the family commercially with Llama 2 in July 2023.

Released	2023-02-24
License	Noncommercial research license
Weights	Open weights
Parameters	7B / 13B / 33B / 65B (dense)
Context	2K
Architecture	Dense decoder-only transformer with RMSNorm, SwiGLU activations and rotary positional embeddings (RoPE).
Modalities	Text
Status	Deprecated

Benchmarks

MMLU (5-shot) — LLaMA-65B63.4%
MMLU (5-shot) — LLaMA-13B46.9%
HumanEval pass@1 — LLaMA-65B23.7%
MBPP pass@1 — LLaMA-65B37.7%
TriviaQA (0-shot) — LLaMA-65B68.2%
Natural Questions (0-shot) — LLaMA-65B23.8%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Strengths

LLaMA-13B outperformed the much larger GPT-3 (175B) on most benchmarks at ~10x fewer parameters
LLaMA-65B was competitive with Chinchilla-70B and PaLM-540B
Trained entirely on publicly available data — no proprietary corpora
Compact enough to run on a single GPU (7B/13B), enabling broad research access
Seeded the open-weight ecosystem (Alpaca, Vicuna, llama.cpp) after the weights spread

Best for

Academic and research experimentation on foundation models
A base for instruction-tuning and fine-tuning research (e.g. Alpaca, Vicuna)
Studying efficient training and inference at 7B-65B scale
Local/offline text generation on consumer hardware via community runtimes
Benchmarking and reproducibility work on openly trained LLMs

FAQ

What sizes did LLaMA (Llama 1) come in?

Four dense sizes: 6.7B, 13B, 32.5B and 65.2B parameters, usually rounded to 7B, 13B, 33B and 65B. All are decoder-only transformers with a 2,048-token context window.

How did LLaMA compare to GPT-3?

Meta's paper reported that LLaMA-13B outperformed GPT-3 (175B) on most benchmarks despite being about ten times smaller, and that LLaMA-65B was competitive with Chinchilla-70B and PaLM-540B.

Was LLaMA open-source or commercially usable?

No. The weights were released only under a noncommercial research license, granted case by case. Commercial use did not arrive until Llama 2 in July 2023. The original weights leaked publicly via torrent on March 3, 2023.

How much data was LLaMA trained on?

The 7B and 13B models were trained on about 1.0 trillion tokens and the 33B and 65B models on about 1.4 trillion tokens, all from publicly available sources.

// Overview

// Benchmarks

// Strengths

// Best for

// FAQ

Overview

Benchmarks

Strengths

Best for

FAQ