AI/TLDR

DeepSeek LLM

DeepSeek's first general LLM: dense 7B and 67B Base and Chat models trained on 2T tokens, the open-weights foundation for the later V-series.

Overview

DeepSeek LLM is DeepSeek's first general-purpose large language model, released on 29 November 2023, a month after the company's debut DeepSeek Coder model. It shipped in two dense sizes — 7B and 67B — each available as a Base (pre-trained) model and a Chat model aligned for instruction following. All four checkpoints were released with open weights.

Both sizes were pre-trained from scratch on a 2-trillion-token corpus spanning English and Chinese, with a 4,096-token sequence length. The 67B model uses a LLaMA-style transformer decoder with Grouped-Query Attention; the Chat variants were refined through supervised fine-tuning and Direct Preference Optimization. The accompanying paper, 'DeepSeek LLM: Scaling Open-Source Language Models with Longtermism' (arXiv:2401.02954), reports that DeepSeek LLM 67B surpasses LLaMA-2 70B on code, math, and reasoning benchmarks, and that the 67B Chat model outperforms GPT-3.5.

DeepSeek LLM is now legacy: it predates DeepSeek's pivot to Mixture-of-Experts and has been superseded by the DeepSeek-V2, V3 and later lines. It is no longer offered through DeepSeek's first-party API, but the weights remain freely downloadable from Hugging Face, making it a clear historical reference point for the company's scaling work and its transition toward the V-series.

Released2023-11-29
LicenseOpen weights — code under MIT, model under the DeepSeek Model License (commercial use permitted)
WeightsOpen weights
Parameters7B and 67B (dense)
Context4K
Max outputNot separately specified (4K total sequence length)
ArchitectureDense auto-regressive transformer decoder (LLaMA-style). The 7B uses Multi-Head Attention (MHA); the 67B uses Grouped-Query Attention (GQA) and 95 layers. Pre-trained from scratch on 2 trillion English and Chinese tokens; Chat variants tuned with supervised fine-tuning and Direct Preference Optimization (DPO).
Knowledge cutoffNot officially stated
ModalitiesText
StatusLegacy

Benchmarks

  1. MMLU (67B Chat)71.1%
  2. HumanEval (67B Chat)73.8% pass@1
  3. GSM8K (67B Chat)84.1%
  4. MATH (67B Chat)32.6%
  5. BBH (67B Chat)71.7%
  6. C-Eval (67B Chat)65.2%
  7. CMMLU (67B Chat)67.8%
  8. MMLU (67B Base)71.3%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Strengths

  • Fully open weights (7B and 67B, Base and Chat) with commercial use permitted under the DeepSeek Model License
  • 67B model outperforms LLaMA-2 70B on code, math, and reasoning benchmarks per the DeepSeek LLM paper
  • Strong bilingual (English + Chinese) performance from a 2-trillion-token training corpus
  • Backed by a public scaling-laws paper, giving rare transparency into the training methodology

Best for

  • Research and reproduction of open-source LLM scaling-law work
  • Local or self-hosted English/Chinese text generation and chat where a compact 4K-context dense model suffices
  • Coding and math assistance via the 67B Chat checkpoint
  • A historical baseline for comparing DeepSeek's later MoE V-series models

FAQ

What is DeepSeek LLM?

DeepSeek LLM is DeepSeek's first general-purpose large language model, released on 29 November 2023. It came in dense 7B and 67B sizes, each with a Base (pre-trained) and a Chat (instruction-tuned) variant, all released as open weights.

How was DeepSeek LLM trained?

Both sizes were pre-trained from scratch on 2 trillion English and Chinese tokens with a 4,096-token sequence length. The 67B model uses a LLaMA-style transformer with Grouped-Query Attention, and the Chat variants were aligned with supervised fine-tuning and Direct Preference Optimization.

Is DeepSeek LLM open source and free to use commercially?

The weights are openly available on Hugging Face. The code is under the MIT License and the model is under the DeepSeek Model License, which permits commercial use.

Is DeepSeek LLM still current?

No. DeepSeek LLM is a legacy model that predates the company's move to Mixture-of-Experts. It has been superseded by the DeepSeek-V2, V3 and later lines and is no longer served on DeepSeek's first-party API, though the weights remain downloadable.