AI/TLDR

Llama 3.2

Meta's edge-first refresh of Llama 3: tiny 1B/3B on-device text models plus the line's first vision models, 11B and 90B, all with a 128K context.

Overview

Llama 3.2 is Meta's September 25, 2024 update to the open-weight Llama 3 line, and it splits in two directions. The 1B and 3B models are lightweight, text-only language models built for edge and mobile devices, with day-one optimization for Qualcomm, MediaTek, and Arm processors so they can run summarization, rewriting, instruction following, and tool calling locally. The 11B and 90B models are Meta's first Llama vision models: they accept images alongside text and handle document understanding, chart and graph reading, image captioning, and visual reasoning. Every Llama 3.2 model carries a 128,000-token context window and a December 2023 knowledge cutoff.

Architecturally, the 1B and 3B are dense auto-regressive transformers with Grouped-Query Attention, pretrained on up to 9 trillion tokens. The 11B and 90B vision models reuse the Llama 3.1 8B and 70B text backbones and bolt on a separately trained cross-attention image adapter, trained on roughly 6 billion image-text pairs; the text encoder weights are kept frozen so text performance is preserved. Text generation officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, while image-plus-text tasks are English-only.

Llama 3.2 is distributed under the custom Llama 3.2 Community License Agreement (a commercial license, not an OSI open-source license). Weights are downloadable from Hugging Face under the meta-llama organization and from llama.com, and the models are widely hosted by third parties such as OpenRouter and AWS Bedrock, with the 1B and 3B available through Ollama (llama3.2 and llama3.2:1b) for local use. Within the Llama 3 line it was followed by the text-only Llama 3.3 70B in December 2024, and fully native multimodality arrived later with Llama 4.

Released2024-09-25
LicenseLlama 3.2 Community License Agreement
WeightsOpen weights
Parameters1B (1.23B), 3B (3.21B), 11B Vision (10.6B), 90B Vision (88.8B)
Context128K
Max outputNot separately specified (shares the 128K context budget)
ArchitectureDense auto-regressive transformer with Grouped-Query Attention; the 11B and 90B vision models add a cross-attention image adapter on the Llama 3.1 8B/70B text backbones
Knowledge cutoff2023-12
ModalitiesText, Vision
StatusAvailable (superseded in the Llama 3 line by Llama 3.3 70B; native multimodality moved to Llama 4)

Benchmarks

  1. MMLU (CoT) — 90B Vision86%
  2. MMMU (val, CoT) — 90B Vision60.3%
  3. ChartQA (test) — 90B Vision85.5%
  4. DocVQA (test, ANLS) — 90B Vision90.1%
  5. AI2 Diagram (test) — 90B Vision92.3%
  6. MathVista (testmini) — 90B Vision57.3%
  7. MMMU (val, CoT) — 11B Vision50.7%
  8. ChartQA (test) — 11B Vision83.4%
  9. DocVQA (test, ANLS) — 11B Vision88.4%
  10. MMLU (5-shot) — 3B text63.4%
  11. IFEval — 3B text77.4%
  12. GSM8K (CoT, 8-shot) — 3B text77.7%
  13. BFCL V2 (Tool Use) — 3B text67%
  14. MMLU (5-shot) — 1B text49.3%
  15. IFEval — 1B text59.5%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$1.38 / 1M tokens
Output$1.38 / 1M tokens

Llama 3.2 90B Vision — median across providers on Artificial Analysis (median input $0.45 / output $0.90; the smaller 1B/3B text models are far cheaper, often under $0.10/1M). Meta ships the weights free; these are third-party hosting prices.

Pricing source ↗

Strengths

  • First Llama vision models — the 11B and 90B accept image input for document, chart, and diagram understanding; the 90B matches GPT-4o on chart reading (ChartQA) and beats Claude 3 Opus and Gemini 1.5 Pro on scientific-diagram interpretation (AI2 Diagram 92.3)
  • Genuinely small on-device models — the 1B and 3B are tuned for phones and edge hardware (Qualcomm, MediaTek, Arm), with the 3B leading its size class on instruction following and tool use (BFCL V2 67.0)
  • Open weights under a permissive-for-most commercial license, downloadable from Hugging Face and runnable locally via Ollama
  • 128K-token context window across the whole family, large for models this small
  • Multilingual text generation across eight officially supported languages
  • Broad host support — OpenRouter, AWS Bedrock, and many partner platforms offered it on day one

Best for

  • On-device and mobile assistants: summarization, rewriting, and instruction following running locally on the 1B or 3B
  • Document and image understanding with the 11B or 90B Vision models — reading charts, graphs, tables, and scientific diagrams
  • Image captioning and visual question answering (DocVQA 90.1, ChartQA 85.5 on the 90B)
  • Edge tool-calling and lightweight agentic workflows where the 3B's BFCL tool-use accuracy matters
  • Multilingual text generation and rewriting across the eight supported languages
  • Privacy-sensitive or offline deployments that need open weights and local inference

How to access

ProviderModel ID
OpenRouter ↗meta-llama/llama-3.2-3b-instruct
OpenRouter ↗meta-llama/llama-3.2-90b-vision-instruct
Ollama ↗llama3.2
AWS Bedrock ↗meta.llama3-2-90b-instruct-v1:0

Llama 3 — every version

The full lineage of the Llama 3 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Llama 3.3 70Bcurrent2024-12-06Open weights
Llama 3.22024-09-25Open weights
Llama 3.12024-07-23Open weights
Llama 32024-04-18Open weights

FAQ

What models are in the Llama 3.2 release?

Four sizes. The 1B (1.23B parameters) and 3B (3.21B) are lightweight, text-only models built for on-device and mobile use, with day-one support for Qualcomm, MediaTek, and Arm hardware. The 11B (10.6B) and 90B (88.8B) are Meta's first Llama vision models: they accept image input for document, chart, and diagram understanding. All four share a 128,000-token context window and a December 2023 knowledge cutoff.

Is Llama 3.2 open-weight and free to use?

Yes. Meta releases the weights under the Llama 3.2 Community License Agreement, downloadable from Hugging Face under the meta-llama organization and from llama.com, with the 1B and 3B also available through Ollama. It is a custom commercial license that permits commercial use but is not an OSI-approved open-source license, and it carries an acceptable-use policy plus extra terms for very large deployments.

How good are the Llama 3.2 Vision models?

Strong on document and chart understanding. Meta reports the instruction-tuned 90B Vision matches GPT-4o on chart reading (ChartQA 85.5) and beats Claude 3 Opus and Gemini 1.5 Pro on scientific-diagram interpretation (AI2 Diagram 92.3), with DocVQA 90.1 and MMMU 60.3. The 11B is a smaller alternative (ChartQA 83.4, DocVQA 88.4) competitive with Claude 3 Haiku and GPT-4o-mini on image tasks. Note the vision models take images plus text in but only output text, and image+text tasks are English-only.

How does Llama 3.2 fit with the rest of the Llama 3 line?

It came after Llama 3.1 (July 2024) and added the small edge models and the first vision models. It was then followed by the text-only Llama 3.3 70B in December 2024, the final Llama 3 release. The Llama 3.2 vision models bolt a cross-attention image adapter onto the Llama 3.1 8B and 70B text backbones; fully native, built-in multimodality only arrived later with Llama 4.