DeepSeek-V3-0324

Name: DeepSeek-V3-0324
Author: DeepSeek

A March 2025 refresh of DeepSeek V3 with markedly stronger reasoning, math, and front-end code generation — open weights under MIT.

Overview

DeepSeek-V3-0324 is an open-weight large language model released by DeepSeek on March 24, 2025, as an incremental update to the original DeepSeek-V3 base model. It keeps V3's Mixture-of-Experts architecture — 671 billion total parameters with roughly 37 billion activated per token (the published checkpoint weighs in at 685B) — but post-training improvements deliver a meaningful jump in reasoning, math, and coding quality without changing the underlying design.

Like the rest of the V3 family, DeepSeek-V3-0324 combines Multi-head Latent Attention (MLA) with the DeepSeekMoE expert layout, an auxiliary-loss-free load-balancing strategy, and a multi-token prediction objective. It is a text-only model with a 128K-token context window and a July 2024 knowledge cutoff. The weights are published on Hugging Face under the permissive MIT license, allowing commercial use, fine-tuning, and self-hosting.

DeepSeek highlights three areas of improvement over the December 2024 V3 release: significantly stronger benchmark reasoning scores, better front-end web-development code generation, and improved Chinese-language proficiency. DeepSeek-V3-0324 was the last major non-reasoning V3 checkpoint before the hybrid thinking/non-thinking DeepSeek-V3.1 line that followed in August 2025.

Released	2025-03-24
License	MIT
Weights	Open weights
Parameters	671B total / 37B active (685B checkpoint)
Context	128K
Max output	8K
Architecture	Mixture-of-Experts (MoE) with Multi-head Latent Attention (MLA) and DeepSeekMoE, auxiliary-loss-free load balancing, and a multi-token prediction training objective. 671B total parameters with ~37B activated per token; pre-trained on 14.8 trillion tokens.
Knowledge cutoff	July 2024
Modalities	Text
Status	Available

Benchmarks

MMLU-Pro81.2%
GPQA68.4%
AIME59.4%
LiveCodeBench49.2%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.27 / 1M tokens (cache miss) per 1M tokens
Cached input	$0.07 / 1M tokens (cache hit) per 1M tokens
Output	$1.10 / 1M tokens per 1M tokens

Official DeepSeek API pricing for the deepseek-chat (V3) endpoint. Open weights are also available for free self-hosting under MIT.

Pricing source ↗

Strengths

Large gains over the original V3 on reasoning and math benchmarks (AIME jumps from 39.6 to 59.4)
Strong code generation, with DeepSeek specifically calling out improved front-end web development output
Open weights under the permissive MIT license — free to self-host, fine-tune, and use commercially
Efficient MoE inference: only ~37B of 671B parameters activate per token
Improved Chinese-language proficiency alongside English

Best for

Self-hosted general-purpose chat and assistant deployments where open weights and MIT licensing matter
Code generation and front-end web development
Math and reasoning-heavy tasks via standard (non-thinking) completion
Cost-sensitive API workloads needing competitive token pricing
Fine-tuning a strong open MoE base for domain-specific applications

How to access

Provider	Model ID
DeepSeek ↗	`deepseek-chat`
OpenRouter ↗	`deepseek/deepseek-chat-v3-0324`

DeepSeek V3 — every version

The full lineage of the DeepSeek V3 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
DeepSeek-V3.2current	2025-12-01	—	Open weights
DeepSeek-V3.2-Speciale	2025-12-01	—	Open weights
DeepSeek-V3.2-Exp	2025-09-29	—	Open weights
DeepSeek-V3.1-Terminus	2025-09-22	—	Open weights
DeepSeek-V3.1	2025-08-21	—	Open weights
DeepSeek-V3-0324	2025-03-24	—	Open weights
DeepSeek-V3	2024-12-26	—	Open weights
DeepSeek-V2.5	2024-09-05	—	Open weights
DeepSeek-V2	2024-05	—	Open weights

FAQ

When was DeepSeek-V3-0324 released and what does the name mean?

DeepSeek released it on March 24, 2025. The '0324' suffix is the release date (month and day), distinguishing this checkpoint from the original DeepSeek-V3 published in December 2024.

Is DeepSeek-V3-0324 open source, and what license does it use?

The weights are openly published on Hugging Face under the MIT license, which permits commercial use, fine-tuning, redistribution, and self-hosting. The published checkpoint is about 685B parameters on disk.

How big is DeepSeek-V3-0324 and what is its architecture?

It is a Mixture-of-Experts model with 671 billion total parameters and roughly 37 billion activated per token. It uses Multi-head Latent Attention (MLA) plus the DeepSeekMoE layout, auxiliary-loss-free load balancing, and a multi-token prediction training objective, with a 128K-token context window.

How does DeepSeek-V3-0324 improve on the original DeepSeek-V3?

Post-training upgrades lifted benchmark scores substantially: MMLU-Pro rose from 75.9 to 81.2, GPQA from 59.1 to 68.4, AIME from 39.6 to 59.4, and LiveCodeBench from 39.2 to 49.2. DeepSeek also cites better front-end web code generation and stronger Chinese-language ability.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// DeepSeek V3 — every version

// FAQ