Overview
Kimi-Dev-72B is an open-weight, 72-billion-parameter large language model from Moonshot AI (the team behind Kimi), released on June 16, 2025 and purpose-built for software engineering. Rather than a general chat assistant, Kimi-Dev-72B is tuned for the concrete loop developers live in: reading a bug report or GitHub issue, finding the file that needs to change, writing the fix, and producing unit tests that prove it works.
The model starts from the Qwen2.5-72B base and is shaped in two stages. First, a roughly 150-billion-token mid-training phase exposes it to real GitHub issues and pull-request commits so it learns how human engineers reason about defects. Then large-scale reinforcement learning lets Kimi-Dev-72B autonomously patch real repositories inside Docker containers, earning a reward only when the entire test suite passes — a strict, execution-grounded signal that pushes it toward fixes that actually run.
At launch, Kimi-Dev-72B scored 60.4% on SWE-bench Verified, which Moonshot reported as a new state of the art among open-source models. It is released under the permissive MIT license with full weights on Hugging Face, supports a 131K-token context window, and can be self-hosted through vLLM, SGLang, or Hugging Face Transformers, making it practical for teams that want a capable coding model they fully control.
| Released | 2025-06 |
|---|---|
| License | MIT |
| Weights | Open weights |
| Parameters | 72B |
| Context | 131K |
| Architecture | Dense transformer fine-tuned from Qwen2.5-72B; ~150B-token mid-training on GitHub issues and PR commits, then large-scale reinforcement learning that patches real repos in Docker and rewards only when the full test suite passes. |
| Knowledge cutoff | Not disclosed |
| Modalities | Text |
| Status | Available |
Benchmarks
- SWE-bench Verified60.4%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.29 per 1M tokens |
|---|---|
| Output | $1.15 per 1M tokens |
Kimi-Dev-72B is open-weight and not sold on Moonshot's first-party API; these are third-party hosting rates on OpenRouter, which also lists a free tier (moonshotai/kimi-dev-72b:free). Self-hosting cost is your own compute.
Strengths
- State-of-the-art open-source result on SWE-bench Verified (60.4%) at its June 2025 release
- Permissive MIT license with full open weights — unrestricted commercial and private use
- Trained with execution-grounded RL: rewarded only when a patch passes the full Docker test suite, favoring fixes that genuinely run
- 131K-token context handles large files and multi-file repository tasks
- Self-hostable via vLLM, SGLang, and Transformers, with broad community quantizations (GGUF, GPTQ, MLX) for local use
Best for
- Automated bug fixing and GitHub issue resolution against real repositories
- Generating and repairing unit tests for changed code
- Fault localization — pinpointing which file or function needs editing
- Code review and patch suggestion inside engineering workflows
- Self-hosted, license-clean coding assistant for teams that need on-prem control
How to access
| Provider | Model ID |
|---|---|
| OpenRouter ↗ | moonshotai/kimi-dev-72b |
FAQ
What is Kimi-Dev-72B?
Kimi-Dev-72B is an open-weight, 72-billion-parameter coding model from Moonshot AI, released June 16, 2025. It is built on Qwen2.5-72B and specialized for software engineering tasks like bug fixing, GitHub issue resolution, and unit-test generation.
How well does Kimi-Dev-72B perform on SWE-bench?
It scores 60.4% on SWE-bench Verified, which Moonshot AI reported as a state-of-the-art result among open-source models at its release.
Is Kimi-Dev-72B open source and free to use?
Yes. The full weights are published on Hugging Face under the permissive MIT license, allowing commercial and private use. You can self-host it, and hosted access is available on OpenRouter, which also offers a free tier.
What context length does Kimi-Dev-72B support?
It supports a 131K-token (128K) context window, enough for large source files and multi-file repository tasks. The model is text-only and can be served with vLLM, SGLang, or Hugging Face Transformers.
