Overview
Llama 3.3 70B is Meta's open-weight, instruction-tuned large language model released on December 6, 2024, and the final release of the Llama 3 line before Llama 4. It is a text-only, dense auto-regressive transformer with 70 billion parameters and a 128,000-token context window, using Grouped-Query Attention for efficient inference. Meta pretrained it on more than 15 trillion tokens of publicly available data with a December 2023 knowledge cutoff, then aligned it with supervised fine-tuning and RLHF.
The headline of the Llama 3.3 release is efficiency: Meta positions the 70B model as delivering quality comparable to the much larger Llama 3.1 405B on many tasks, particularly instruction following, mathematics, and tool use, while costing roughly four to five times less to run. It is multilingual, officially supporting English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai, and is tuned for dialogue, coding, and agentic tool-calling use cases.
Llama 3.3 70B is distributed under the Llama 3.3 Community License Agreement, a custom commercial license rather than a standard OSI open-source license. The weights are downloadable from Hugging Face (meta-llama/Llama-3.3-70B-Instruct) and the model is hosted by many third-party providers, including Together AI, OpenRouter, and Ollama for local use.
| Released | 2024-12-06 |
|---|---|
| License | Llama 3.3 Community License Agreement |
| Weights | Open weights |
| Parameters | 70B |
| Context | 128K |
| Architecture | Dense transformer (auto-regressive, Grouped-Query Attention) |
| Knowledge cutoff | 2023-12 |
| Modalities | Text |
| Status | Available |
Benchmarks
- MMLU (CoT)86%
- MMLU Pro (CoT)68.9%
- IFEval92.1%
- HumanEval (pass@1)88.4%
- MBPP EvalPlus (pass@1)87.6%
- MATH (CoT)77%
- GPQA Diamond (CoT)50.5%
- BFCL v2 (Tool Use)77.3%
- MGSM (Multilingual)91.1%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $1.04 / 1M tokens |
|---|---|
| Output | $1.04 / 1M tokens |
Together AI hosted Llama-3.3-70B-Instruct-Turbo; many providers price it lower.
Strengths
- Near-405B quality at 70B size — matches or beats Llama 3.1 405B on instruction following (IFEval 92.1) and math (MATH 77.0) at far lower cost
- Open weights, downloadable from Hugging Face and self-hostable (runs from a single ~43GB pull via Ollama)
- Strong instruction following and tool use (BFCL v2 77.3), suited to agentic workflows
- 128K-token context window for long documents and multi-turn conversations
- Multilingual across eight officially supported languages
- Widely hosted — Together AI, OpenRouter, Ollama and others offer drop-in API access
Best for
- Cost-efficient self-hosted or API chat assistants needing frontier-class quality below 405B
- Tool-calling and agentic pipelines that rely on reliable instruction following
- Coding assistance and code generation (HumanEval 88.4, MBPP EvalPlus 87.6)
- Multilingual text generation and translation across the eight supported languages
- Long-document summarization and analysis up to 128K tokens
- On-premises or private deployments where open weights and data control matter
How to access
| Provider | Model ID |
|---|---|
| Together AI ↗ | meta-llama/Llama-3.3-70B-Instruct-Turbo |
| OpenRouter ↗ | meta-llama/llama-3.3-70b-instruct |
| Ollama ↗ | llama3.3 |
Llama 3 — every version
The full lineage of the Llama 3 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
FAQ
Is Llama 3.3 70B open-weight and free to use?
Yes. Meta releases the weights under the Llama 3.3 Community License Agreement, and they can be downloaded from Hugging Face (meta-llama/Llama-3.3-70B-Instruct) or pulled via Ollama as llama3.3. The license permits commercial use but is a custom community license rather than a standard OSI-approved open-source license, and includes an acceptable-use policy plus terms for very large deployments.
How does Llama 3.3 70B compare to Llama 3.1 405B?
Meta positions Llama 3.3 70B as delivering quality comparable to the much larger Llama 3.1 405B on many tasks while costing roughly four to five times less to run. On its model card it matches or beats 405B on instruction following (IFEval 92.1) and mathematics (MATH 77.0), making it a far cheaper option for similar real-world performance.
Does Llama 3.3 70B support images or other modalities?
No. Llama 3.3 70B is text-only — it takes multilingual text in and produces multilingual text and code out, with no vision or audio input. For image understanding in the Llama 3 line you would use the Llama 3.2 11B or 90B Vision models; native multimodality across the board arrived later with Llama 4.
What is the context window and knowledge cutoff of Llama 3.3 70B?
Llama 3.3 70B has a 128,000-token context window and a knowledge cutoff of December 2023. It was pretrained on more than 15 trillion tokens of publicly available data and officially supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.