AI/TLDR

DeepSeek-V3

DeepSeek's original 671B-parameter open-weight Mixture-of-Experts model, with 37B active per token.

Overview

DeepSeek-V3 is the original flagship model of DeepSeek's V3 line, released on December 26, 2024 by the Chinese AI lab DeepSeek. It is an open-weight Mixture-of-Experts (MoE) language model with 671 billion total parameters, of which only about 37 billion are activated per token, giving it the capacity of a very large model while keeping inference comparatively cheap. DeepSeek-V3 supports a 128K-token context window and was trained on 14.8 trillion tokens.

Architecturally, DeepSeek-V3 carried forward the Multi-head Latent Attention (MLA) and DeepSeekMoE designs validated in DeepSeek-V2, and added two notable innovations: an auxiliary-loss-free strategy for expert load balancing and a multi-token prediction training objective. According to DeepSeek's own technical report, the full training run used only about 2.788 million H800 GPU hours, an efficiency result that drew wide industry attention given the model's strength.

At launch, DeepSeek-V3 posted benchmark numbers competitive with leading closed models of the period such as GPT-4o and Claude 3.5 Sonnet, while being released under an open model license that permits commercial use (with the accompanying code under MIT). It became the backbone for DeepSeek-R1 and was later refined through V3-0324, V3.1, and V3.2 before the V4 generation arrived.

Released2024-12-26
LicenseDeepSeek Model License (commercial use permitted); code under MIT
WeightsOpen weights
Parameters671B total / 37B active per token (Mixture-of-Experts)
Context128K tokens
ArchitectureMixture-of-Experts (MoE) transformer with 61 layers, using Multi-head Latent Attention (MLA) to compress the KV cache and the DeepSeekMoE sparse-expert design. It introduced an auxiliary-loss-free load-balancing strategy and a multi-token prediction training objective. Pre-trained on 14.8 trillion tokens using roughly 2.788M H800 GPU hours.
Modalitiestext
StatusSuperseded. DeepSeek-V3 was the original December 2024 release; DeepSeek later shipped V3-0324, V3.1, V3.2, and the V4 line. Weights remain openly available.

Benchmarks

  1. MMLU (EM)88.5%
  2. MMLU-Pro (EM)75.9%
  3. GPQA-Diamond (Pass@1)59.1%
  4. MATH-500 (EM)90.2%
  5. AIME 2024 (Pass@1)39.2%
  6. HumanEval-Mul (Pass@1)82.6%
  7. LiveCodeBench (Pass@1)37.6%
  8. Codeforces (percentile)51.6%
  9. SWE-bench Verified (resolved)42%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.27 / 1M tokens (cache miss) per 1M tokens
Cached input$0.068 / 1M tokens (cache hit) per 1M tokens
Output$1.10 / 1M tokens per 1M tokens

Standard DeepSeek API pricing effective Feb 9, 2025. At launch (Dec 26, 2024) DeepSeek-V3 ran a promotional rate of $0.14/1M input (cache miss) and $0.27/1M output through Feb 8, 2025.

Pricing source ↗

Strengths

  • Open weights under a commercially permissive license, downloadable and self-hostable
  • MoE efficiency: 671B total capacity but only ~37B active params per token, lowering inference cost
  • 128K-token context window for long documents and codebases
  • Strong math and coding benchmark results for a late-2024 open model
  • Documented, unusually efficient training (~2.788M H800 GPU hours per DeepSeek's report)
  • Very low API pricing relative to closed frontier models of its era

Best for

  • General-purpose chat and instruction following at low cost
  • Code generation and assistance across many languages
  • Math and quantitative reasoning tasks
  • Long-context document analysis and summarization (up to 128K tokens)
  • Self-hosted deployment for teams needing open weights and data control
  • Foundation/backbone for fine-tuning and reasoning-model research (e.g., DeepSeek-R1)

How to access

ProviderModel ID
DeepSeek ↗deepseek-chat

DeepSeek V3 — every version

The full lineage of the DeepSeek V3 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
DeepSeek-V3.2current2025-12-01Open weights
DeepSeek-V3.2-Speciale2025-12-01Open weights
DeepSeek-V3.2-Exp2025-09-29Open weights
DeepSeek-V3.1-Terminus2025-09-22Open weights
DeepSeek-V3.12025-08-21Open weights
DeepSeek-V3-03242025-03-24Open weights
DeepSeek-V32024-12-26Open weights
DeepSeek-V2.52024-09-05Open weights
DeepSeek-V22024-05Open weights

FAQ

When was DeepSeek-V3 released?

DeepSeek released DeepSeek-V3 on December 26, 2024. The accompanying DeepSeek-V3 Technical Report was posted to arXiv (2412.19437) the next day, December 27, 2024.

How many parameters does DeepSeek-V3 have?

DeepSeek-V3 is a Mixture-of-Experts model with 671 billion total parameters, but only about 37 billion are activated per token. This sparse design gives it the knowledge capacity of a very large model while keeping per-token inference cost much lower.

Is DeepSeek-V3 open source, and what is its license?

DeepSeek-V3 is open-weight: the model weights are publicly downloadable on Hugging Face under the DeepSeek Model License, which permits commercial use, and the accompanying code is released under the MIT License.

What context window does DeepSeek-V3 support?

DeepSeek-V3 supports a 128K-token context window, enabled by its Multi-head Latent Attention (MLA) design, which compresses the key-value cache to keep long-context inference memory-efficient.