DeepSeek-V3

Name: DeepSeek-V3
Author: DeepSeek

DeepSeek's original 671B-parameter open-weight Mixture-of-Experts model, with 37B active per token.

Overview

DeepSeek-V3 is the original flagship model of DeepSeek's V3 line, released on December 26, 2024 by the Chinese AI lab DeepSeek. It is an open-weight Mixture-of-Experts (MoE) language model with 671 billion total parameters, of which only about 37 billion are activated per token, giving it the capacity of a very large model while keeping inference comparatively cheap. DeepSeek-V3 supports a 128K-token context window and was trained on 14.8 trillion tokens.

Architecturally, DeepSeek-V3 carried forward the Multi-head Latent Attention (MLA) and DeepSeekMoE designs validated in DeepSeek-V2, and added two notable innovations: an auxiliary-loss-free strategy for expert load balancing and a multi-token prediction training objective. According to DeepSeek's own technical report, the full training run used only about 2.788 million H800 GPU hours, an efficiency result that drew wide industry attention given the model's strength.

At launch, DeepSeek-V3 posted benchmark numbers competitive with leading closed models of the period such as GPT-4o and Claude 3.5 Sonnet, while being released under an open model license that permits commercial use (with the accompanying code under MIT). It became the backbone for DeepSeek-R1 and was later refined through V3-0324, V3.1, and V3.2 before the V4 generation arrived.

Released	2024-12-26
License	DeepSeek Model License (commercial use permitted); code under MIT
Weights	Open weights
Parameters	671B total / 37B active per token (Mixture-of-Experts)
Context	128K tokens
Architecture	Mixture-of-Experts (MoE) transformer with 61 layers, using Multi-head Latent Attention (MLA) to compress the KV cache and the DeepSeekMoE sparse-expert design. It introduced an auxiliary-loss-free load-balancing strategy and a multi-token prediction training objective. Pre-trained on 14.8 trillion tokens using roughly 2.788M H800 GPU hours.
Modalities	text
Status	Superseded. DeepSeek-V3 was the original December 2024 release; DeepSeek later shipped V3-0324, V3.1, V3.2, and the V4 line. Weights remain openly available.

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.27 / 1M tokens (cache miss) per 1M tokens
Cached input	$0.068 / 1M tokens (cache hit) per 1M tokens
Output	$1.10 / 1M tokens per 1M tokens

Standard DeepSeek API pricing effective Feb 9, 2025. At launch (Dec 26, 2024) DeepSeek-V3 ran a promotional rate of $0.14/1M input (cache miss) and $0.27/1M output through Feb 8, 2025.

Pricing source ↗

Strengths

Open weights under a commercially permissive license, downloadable and self-hostable
MoE efficiency: 671B total capacity but only ~37B active params per token, lowering inference cost
128K-token context window for long documents and codebases
Strong math and coding benchmark results for a late-2024 open model
Documented, unusually efficient training (~2.788M H800 GPU hours per DeepSeek's report)
Very low API pricing relative to closed frontier models of its era

Best for

General-purpose chat and instruction following at low cost
Code generation and assistance across many languages
Math and quantitative reasoning tasks
Long-context document analysis and summarization (up to 128K tokens)
Self-hosted deployment for teams needing open weights and data control
Foundation/backbone for fine-tuning and reasoning-model research (e.g., DeepSeek-R1)

How to access

Provider	Model ID
DeepSeek ↗	`deepseek-chat`

DeepSeek V3 — every version

The full lineage of the DeepSeek V3 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
DeepSeek-V3.2current	2025-12-01	—	Open weights
DeepSeek-V3.2-Speciale	2025-12-01	—	Open weights
DeepSeek-V3.2-Exp	2025-09-29	—	Open weights
DeepSeek-V3.1-Terminus	2025-09-22	—	Open weights
DeepSeek-V3.1	2025-08-21	—	Open weights
DeepSeek-V3-0324	2025-03-24	—	Open weights
DeepSeek-V3	2024-12-26	—	Open weights
DeepSeek-V2.5	2024-09-05	—	Open weights
DeepSeek-V2	2024-05	—	Open weights

FAQ

When was DeepSeek-V3 released?

DeepSeek released DeepSeek-V3 on December 26, 2024. The accompanying DeepSeek-V3 Technical Report was posted to arXiv (2412.19437) the next day, December 27, 2024.

How many parameters does DeepSeek-V3 have?

DeepSeek-V3 is a Mixture-of-Experts model with 671 billion total parameters, but only about 37 billion are activated per token. This sparse design gives it the knowledge capacity of a very large model while keeping per-token inference cost much lower.

Is DeepSeek-V3 open source, and what is its license?

DeepSeek-V3 is open-weight: the model weights are publicly downloadable on Hugging Face under the DeepSeek Model License, which permits commercial use, and the accompanying code is released under the MIT License.

What context window does DeepSeek-V3 support?

DeepSeek-V3 supports a 128K-token context window, enabled by its Multi-head Latent Attention (MLA) design, which compresses the key-value cache to keep long-context inference memory-efficient.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// DeepSeek V3 — every version

// FAQ