Overview
DeepSeek-V3-0324 is an open-weight large language model released by DeepSeek on March 24, 2025, as an incremental update to the original DeepSeek-V3 base model. It keeps V3's Mixture-of-Experts architecture — 671 billion total parameters with roughly 37 billion activated per token (the published checkpoint weighs in at 685B) — but post-training improvements deliver a meaningful jump in reasoning, math, and coding quality without changing the underlying design.
Like the rest of the V3 family, DeepSeek-V3-0324 combines Multi-head Latent Attention (MLA) with the DeepSeekMoE expert layout, an auxiliary-loss-free load-balancing strategy, and a multi-token prediction objective. It is a text-only model with a 128K-token context window and a July 2024 knowledge cutoff. The weights are published on Hugging Face under the permissive MIT license, allowing commercial use, fine-tuning, and self-hosting.
DeepSeek highlights three areas of improvement over the December 2024 V3 release: significantly stronger benchmark reasoning scores, better front-end web-development code generation, and improved Chinese-language proficiency. DeepSeek-V3-0324 was the last major non-reasoning V3 checkpoint before the hybrid thinking/non-thinking DeepSeek-V3.1 line that followed in August 2025.
| Released | 2025-03-24 |
|---|---|
| License | MIT |
| Weights | Open weights |
| Parameters | 671B total / 37B active (685B checkpoint) |
| Context | 128K |
| Max output | 8K |
| Architecture | Mixture-of-Experts (MoE) with Multi-head Latent Attention (MLA) and DeepSeekMoE, auxiliary-loss-free load balancing, and a multi-token prediction training objective. 671B total parameters with ~37B activated per token; pre-trained on 14.8 trillion tokens. |
| Knowledge cutoff | July 2024 |
| Modalities | Text |
| Status | Available |
Benchmarks
- MMLU-Pro81.2%
- GPQA68.4%
- AIME59.4%
- LiveCodeBench49.2%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.27 / 1M tokens (cache miss) per 1M tokens |
|---|---|
| Cached input | $0.07 / 1M tokens (cache hit) per 1M tokens |
| Output | $1.10 / 1M tokens per 1M tokens |
Official DeepSeek API pricing for the deepseek-chat (V3) endpoint. Open weights are also available for free self-hosting under MIT.
Strengths
- Large gains over the original V3 on reasoning and math benchmarks (AIME jumps from 39.6 to 59.4)
- Strong code generation, with DeepSeek specifically calling out improved front-end web development output
- Open weights under the permissive MIT license — free to self-host, fine-tune, and use commercially
- Efficient MoE inference: only ~37B of 671B parameters activate per token
- Improved Chinese-language proficiency alongside English
Best for
- Self-hosted general-purpose chat and assistant deployments where open weights and MIT licensing matter
- Code generation and front-end web development
- Math and reasoning-heavy tasks via standard (non-thinking) completion
- Cost-sensitive API workloads needing competitive token pricing
- Fine-tuning a strong open MoE base for domain-specific applications
How to access
| Provider | Model ID |
|---|---|
| DeepSeek ↗ | deepseek-chat |
| OpenRouter ↗ | deepseek/deepseek-chat-v3-0324 |
DeepSeek V3 — every version
The full lineage of the DeepSeek V3 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| DeepSeek-V3.2current | 2025-12-01 | — | Open weights |
| DeepSeek-V3.2-Speciale | 2025-12-01 | — | Open weights |
| DeepSeek-V3.2-Exp | 2025-09-29 | — | Open weights |
| DeepSeek-V3.1-Terminus | 2025-09-22 | — | Open weights |
| DeepSeek-V3.1 | 2025-08-21 | — | Open weights |
| DeepSeek-V3-0324 | 2025-03-24 | — | Open weights |
| DeepSeek-V3 | 2024-12-26 | — | Open weights |
| DeepSeek-V2.5 | 2024-09-05 | — | Open weights |
| DeepSeek-V2 | 2024-05 | — | Open weights |
FAQ
When was DeepSeek-V3-0324 released and what does the name mean?
DeepSeek released it on March 24, 2025. The '0324' suffix is the release date (month and day), distinguishing this checkpoint from the original DeepSeek-V3 published in December 2024.
Is DeepSeek-V3-0324 open source, and what license does it use?
The weights are openly published on Hugging Face under the MIT license, which permits commercial use, fine-tuning, redistribution, and self-hosting. The published checkpoint is about 685B parameters on disk.
How big is DeepSeek-V3-0324 and what is its architecture?
It is a Mixture-of-Experts model with 671 billion total parameters and roughly 37 billion activated per token. It uses Multi-head Latent Attention (MLA) plus the DeepSeekMoE layout, auxiliary-loss-free load balancing, and a multi-token prediction training objective, with a 128K-token context window.
How does DeepSeek-V3-0324 improve on the original DeepSeek-V3?
Post-training upgrades lifted benchmark scores substantially: MMLU-Pro rose from 75.9 to 81.2, GPQA from 59.1 to 68.4, AIME from 39.6 to 59.4, and LiveCodeBench from 39.2 to 49.2. DeepSeek also cites better front-end web code generation and stronger Chinese-language ability.