Overview
Kimi K2 is Moonshot AI (Kimi)'s original trillion-parameter open-weight model, released on July 11, 2025. It is a Mixture-of-Experts (MoE) language model with 1 trillion total parameters but only 32 billion activated per token, which keeps inference costs far below a dense model of comparable scale. Moonshot shipped it as two checkpoints under a Modified MIT License: Kimi-K2-Base, the raw pre-trained foundation model, and Kimi-K2-Instruct, post-trained for chat, coding and tool use.
Kimi K2 was positioned as an 'agentic intelligence' model rather than a step-by-step reasoner: it answers directly (no long chain-of-thought) but is tuned to call tools and execute multi-step coding and software-engineering tasks. It was pre-trained on 15.5 trillion tokens with Moonshot's Muon optimizer and uses Multi-head Latent Attention (MLA) and SwiGLU across 61 layers and 384 experts, with a 128K-token context window. It handles text only.
At launch Kimi K2 stood out as one of the strongest open-weight non-reasoning models on coding and agentic benchmarks such as SWE-bench Verified, and its open weights on Hugging Face made it widely self-hostable. It later seeded a fast-moving line (Kimi K2-Instruct-0905, the K2 Thinking reasoning variant, and the K2.5/K2.6 releases). The original 0711 checkpoint reached end-of-life on Moonshot's first-party API on May 25, 2026, but the weights remain freely available and the model is still served by third-party providers.
| Released | 2025-07-11 |
|---|---|
| License | Modified MIT License |
| Weights | Open weights |
| Parameters | 1T total / 32B active (MoE) |
| Context | 128K |
| Max output | 128K |
| Architecture | Mixture-of-Experts (MoE) with 1 trillion total parameters and 32 billion activated per token. 61 layers (1 dense), 384 routed experts with 8 selected per token plus 1 shared expert, 64 attention heads, 160K vocabulary, Multi-head Latent Attention (MLA), and SwiGLU activation. Pre-trained on 15.5 trillion tokens using the Muon optimizer. Released in two checkpoints: Kimi-K2-Base (foundation) and Kimi-K2-Instruct (post-trained for chat and agentic use). |
| Modalities | Text |
| Status | Legacy (open weights available; first-party Kimi/Moonshot API retired 2026-05-25) |
Benchmarks
- SWE-bench Verified (agentic, single attempt)65.8%
- SWE-bench Verified (agentless)51.8%
- LiveCodeBench v6 (Pass@1)53.7%
- MultiPL-E (Pass@1)85.7%
- AIME 2024 (Avg@64)69.6%
- AIME 2025 (Avg@64)49.5%
- MATH-50097.4%
- MMLU (Exact Match)89.5%
- MMLU-Pro (Exact Match)81.1%
- GPQA-Diamond (Avg@8)75.1%
- Tau2 Retail (Avg@4)70.6%
- AceBench76.5%
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.55 / 1M tokens per 1M tokens |
|---|---|
| Cached input | $0.15 / 1M tokens per 1M tokens |
| Output | $2.20 / 1M tokens per 1M tokens |
Original Moonshot/Kimi launch pricing for kimi-k2-0711-preview (cache-miss input $0.55, cached input $0.15, output $2.20). Third-party listings (e.g. OpenRouter) are broadly similar (~$0.57 input / ~$2.30 output). The first-party Kimi/Moonshot API retired the 0711 checkpoint on 2026-05-25; because the weights are open, it can also be self-hosted or run via third-party providers.
Strengths
- Open weights under a permissive Modified MIT License, fully self-hostable
- Strong agentic coding for its release window (65.8% on SWE-bench Verified, agentic)
- Trillion-parameter quality at 32B active-parameter inference cost (sparse MoE)
- Excellent math accuracy (97.4% on MATH-500)
- 128K-token context for large codebases and long documents
- Tool-use / agent oriented post-training in the Instruct checkpoint
Best for
- Agentic coding assistants and automated software-engineering workflows
- Self-hosted or private deployment where open weights are required
- Tool-calling agents that orchestrate multi-step tasks
- Math and quantitative reasoning over text
- Long-document and large-codebase analysis using the 128K context
- Cost-sensitive high-volume inference via the sparse MoE design
How to access
| Provider | Model ID |
|---|---|
| OpenRouter ↗ | moonshotai/kimi-k2 |
Kimi K2 — every version
The full lineage of the Kimi K2 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.
| Version | Released | Context | License |
|---|---|---|---|
| Kimi K2.7-Codecurrent | 2026-06-12 | 256K | Modified MIT |
| Kimi K2.6 | 2026-04-20 | — | Open weights |
| Kimi K2.5 | 2026-01-27 | — | Open weights |
| Kimi K2-Instruct-0905 | 2025-09-09 | — | Open weights |
| Kimi K2 | 2025-07-11 | — | MIT |
FAQ
Is Kimi K2 open source?
Yes. Moonshot AI released Kimi K2 with open weights on Hugging Face under a Modified MIT License, in two checkpoints: Kimi-K2-Base (the pre-trained foundation model) and Kimi-K2-Instruct (post-trained for chat and agentic use). You can download and self-host either one.
How big is Kimi K2 and what context window does it support?
Kimi K2 is a Mixture-of-Experts model with 1 trillion total parameters but only 32 billion activated per token, so inference is far cheaper than a dense model of similar size. It supports a 128K-token context window and handles text only.
How is Kimi K2 different from Kimi K2 Thinking?
The original Kimi K2 (0711) is a non-reasoning, agentic model: it answers directly and is tuned for tool use and coding. Kimi K2 Thinking, released later, is an explicit reasoning variant that produces extended chain-of-thought and chains hundreds of sequential tool calls. They share the trillion-parameter MoE architecture.
Can I still use Kimi K2 0711?
Moonshot retired the original 0711 checkpoint on its first-party Kimi/Moonshot API on May 25, 2026, recommending newer models like K2.5 or K2.6. However, because Kimi K2's weights are open, you can still self-host it or access it through third-party inference providers such as OpenRouter.
