DeepSeek-V3.2-Exp

Name: DeepSeek-V3.2-Exp
Author: DeepSeek

Experimental open-weight MoE that debuts DeepSeek Sparse Attention to slash long-context cost while matching V3.1-Terminus quality.

Overview

DeepSeek-V3.2-Exp is an experimental open-weight large language model from DeepSeek, released on September 29, 2025 as part of the DeepSeek V3 line. It is built directly on DeepSeek-V3.1-Terminus and positioned by DeepSeek as an intermediate step toward a next-generation architecture rather than a clean-sheet model.

Its headline change is DeepSeek Sparse Attention (DSA): a two-stage attention path where a lightweight "lightning indexer" scores tokens across the context and a fine-grained selector keeps only the most relevant ones for full attention. Layered on the existing 685B-parameter Mixture-of-Experts stack (about 37B active parameters, with Multi-head Latent Attention), DSA cuts the cost of long-context training and inference while DeepSeek reports output quality on par with V3.1-Terminus.

DeepSeek-V3.2-Exp is text-only and ships under the MIT license, with open weights, a technical report, and GPU kernels (TileLang and CUDA) published on GitHub and Hugging Face. At launch DeepSeek made it the default chat/reasoner endpoint and cut API prices by more than 50%.

Released	2025-09-29
License	MIT
Weights	Open weights
Parameters	685B total (~37B active, MoE)
Context	128K
Max output	64K
Architecture	Mixture-of-Experts transformer (256 experts, ~37B active per token) built on the V3/V3.1 stack with Multi-head Latent Attention (MLA), adding DeepSeek Sparse Attention (DSA): a lightweight "lightning indexer" scores context tokens, then fine-grained token selection restricts attention to a top-k subset for cheaper long-context training and inference.
Modalities	Text
Status	Available

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.28 / 1M tokens (cache miss) per 1M tokens
Cached input	$0.028 / 1M tokens (cache hit) per 1M tokens
Output	$0.42 / 1M tokens per 1M tokens

Official DeepSeek API launch pricing (USD), more than 50% below the prior V3.1-Terminus tier. Third-party hosts list close equivalents (OpenRouter ~$0.27 in / $0.41 out).

Pricing source ↗

Strengths

Introduces DeepSeek Sparse Attention (DSA) for substantially cheaper long-context training and inference
Maintains benchmark parity with V3.1-Terminus despite the efficiency changes (e.g. MMLU-Pro 85.0, SimpleQA 97.1)
MIT-licensed open weights with published technical report and open GPU kernels (TileLang + CUDA)
Aggressive API pricing — more than 50% cheaper than the prior V3.1-Terminus tier
Strong math and agentic-search scores (AIME 2025 89.3, BrowseComp 40.1)

Best for

Long-context document, codebase, and transcript analysis where attention cost dominates
Cost-sensitive production deployments needing a cheap, capable open-weight chat/reasoner model
Self-hosting on-prem with open weights and open GPU kernels
Reasoning, math, and competitive-coding tasks
Agentic / tool-use and web-search workflows (BrowseComp, SWE-bench)

How to access

Provider	Model ID
DeepSeek ↗	`deepseek-chat / deepseek-reasoner`
OpenRouter ↗	`deepseek/deepseek-v3.2-exp`

DeepSeek V3 — every version

The full lineage of the DeepSeek V3 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
DeepSeek-V3.2current	2025-12-01	—	Open weights
DeepSeek-V3.2-Speciale	2025-12-01	—	Open weights
DeepSeek-V3.2-Exp	2025-09-29	—	Open weights
DeepSeek-V3.1-Terminus	2025-09-22	—	Open weights
DeepSeek-V3.1	2025-08-21	—	Open weights
DeepSeek-V3-0324	2025-03-24	—	Open weights
DeepSeek-V3	2024-12-26	—	Open weights
DeepSeek-V2.5	2024-09-05	—	Open weights
DeepSeek-V2	2024-05	—	Open weights

FAQ

What is DeepSeek-V3.2-Exp?

It is an experimental open-weight large language model released by DeepSeek on September 29, 2025, built on DeepSeek-V3.1-Terminus. It introduces DeepSeek Sparse Attention (DSA) to make long-context training and inference cheaper while keeping quality on par with V3.1-Terminus.

What is DeepSeek Sparse Attention (DSA)?

DSA is a two-stage attention mechanism: a lightweight "lightning indexer" scores tokens across the context, then a fine-grained selector keeps only the most relevant tokens for full attention. This cuts the compute cost of long sequences with minimal impact on output quality.

Is DeepSeek-V3.2-Exp open source, and what license does it use?

Yes. The weights, technical report, and GPU kernels (TileLang and CUDA) are published on GitHub and Hugging Face under the MIT license, so it can be self-hosted and used commercially.

How much does DeepSeek-V3.2-Exp cost to use?

At launch DeepSeek cut API prices by more than 50%: roughly $0.28 per 1M input tokens (cache miss), about $0.028 per 1M cached input tokens, and $0.42 per 1M output tokens. Third-party hosts like OpenRouter list close equivalents (~$0.27 in / $0.41 out).

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// DeepSeek V3 — every version

// FAQ