Overview
Kimi K2 Thinking is Moonshot AI's flagship open-weight reasoning model, released on November 6, 2025. It extends the Kimi K2 line into an explicit "thinking" agent: a trillion-parameter Mixture-of-Experts model that activates 32 billion parameters per token and interleaves chain-of-thought reasoning with tool use over long horizons.
What sets Kimi K2 Thinking apart is sustained agentic execution. Moonshot reports that it can run 200 to 300 sequential tool calls without human intervention, reasoning coherently across hundreds of steps to research, plan, and solve multi-stage problems. It pairs this with a 256K-token context window and native INT4 quantization (trained via Quantization-Aware Training), which cuts memory and latency without losing accuracy.
Released under a Modified MIT License with full open weights on Hugging Face, Kimi K2 Thinking was the first open model to match or beat leading closed systems on several agentic and reasoning benchmarks at launch. It is served through Moonshot's own Kimi API as well as third-party providers, and is also usable directly in the Kimi chat app.
| Released | 2025-11-06 |
|---|---|
| License | Modified MIT License |
| Weights | Open weights |
| Parameters | 1T total / 32B active (MoE) |
| Context | 256K |
| Max output | 256K |
| Architecture | Mixture-of-Experts (MoE) with 1 trillion total parameters and 32 billion activated per token. 61 layers (1 dense), 384 routed experts with 8 selected per token plus 1 shared expert, 64 attention heads, 7168 attention hidden dimension, 160K vocabulary, Multi-head Latent Attention (MLA), and SwiGLU activation. Ships with native INT4 weights via Quantization-Aware Training (QAT) for roughly 2x faster inference at the same quality. |
| Modalities | Text |
| Status | Available |
Benchmarks
Official Moonshot AI benchmark comparison for Kimi K2 Thinking versus GPT-5 (High), Claude Sonnet 4.5 (Thinking), Kimi K2 0905, DeepSeek-V3.2, and Grok-4. Values are exactly as published; an asterisk (*) marks scores Moonshot re-tested under their own conditions, and null marks cells with no published score. Showing 20 of 24 published benchmarks.
| Benchmark | Kimi K2 Thinking | GPT-5 (High) | Claude Sonnet 4.5 (Thinking) | Kimi K2 0905 | DeepSeek-V3.2 | Grok-4 |
|---|---|---|---|---|---|---|
| Humanity's Last Exam (Text-only), no tools | 23.9% | 26.3% | 19.8*% | 7.9% | 19.8% | 25.4% |
| Humanity's Last Exam (Text-only), w/ tools | 44.9% | 41.7% | 32.0*% | 21.7% | 20.3*% | 41% |
| AIME 2025, no tools | 94.5% | 94.6% | 87% | 51% | 89.3% | 91.7% |
| AIME 2025, w/ python | 99.1% | 99.6% | 100% | 75.2% | 58.1*% | 98.8% |
| HMMT 2025, no tools | 89.4% | 93.3% | 74.6*% | 38.8% | 83.6% | 90% |
| HMMT 2025, w/ python | 95.1% | 96.7% | 88.8*% | 70.4% | 49.5*% | 93.9% |
| IMO-AnswerBench, no tools | 78.6% | 76.0*% | 65.9*% | 45.8% | 76.0*% | 73.1% |
| GPQA-Diamond, no tools | 84.5% | 85.7% | 83.4% | 74.2% | 79.9% | 87.5% |
| MMLU-Pro, no tools | 84.6% | 87.1% | 87.5% | 81.9% | 85% | — |
| MMLU-Redux, no tools | 94.4% | 95.3% | 95.6% | 92.7% | 93.7% | — |
| Longform Writing, no tools | 73.8% | 71.4% | 79.8% | 62.8% | 72.5% | — |
| HealthBench, no tools | 58% | 67.2% | 44.2% | 43.8% | 46.9% | — |
| BrowseComp, w/ tools | 60.2% | 54.9% | 24.1% | 7.4% | 40.1% | — |
| BrowseComp-ZH, w/ tools | 62.3% | 63*% | 42.4*% | 22.2% | 47.9% | — |
| Seal-0, w/ tools | 56.3% | 51.4*% | 53.4*% | 25.2% | 38.5*% | — |
| FinSearchComp-T3, w/ tools | 47.4% | 48.5*% | 44.0*% | 10.4% | 27.0*% | — |
| Frames, w/ tools | 87% | 86.0*% | 85.0*% | 58.1% | 80.2*% | — |
| SWE-bench Verified, w/ tools | 71.3% | 74.9% | 77.2% | 69.2% | 67.8% | — |
| SWE-bench Multilingual, w/ tools | 61.1% | 55.3*% | 68% | 55.9% | 57.9% | — |
| Multi-SWE-bench, w/ tools | 41.9% | 39.3*% | 44.3% | 33.5% | 30.6% | — |
This model's scores
- Humanity's Last Exam (with tools)44.9%
- Humanity's Last Exam (text-only, no tools)23.9%
- BrowseComp60.2%
- BrowseComp-ZH62.3%
- SWE-bench Verified71.3%
- LiveCodeBench v683.1%
- AIME 2025 (with Python)99.1%
- HMMT 2025 (with Python)95.1%
- GPQA84.5%
- MMLU-Pro84.6%
- Artificial Analysis Intelligence Index33index
Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.
Pricing
| Input | $0.60 / 1M tokens per 1M tokens |
|---|---|
| Output | $2.50 / 1M tokens per 1M tokens |
Pricing per OpenRouter listing and Artificial Analysis median across providers; the model is also available with open weights for self-hosting.
Strengths
- Long-horizon agentic execution: stable across 200-300 sequential tool calls
- Strong reasoning with tools (44.9% on Humanity's Last Exam with tools)
- State-of-the-art agentic search (60.2% BrowseComp) for an open model
- Open weights under a permissive Modified MIT License
- 256K-token context window for large codebases and documents
- Native INT4 quantization for cheaper, faster deployment without quality loss
Best for
- Autonomous research agents that browse, gather, and synthesize across many steps
- Multi-step coding and software-engineering tasks with tool orchestration
- Deep reasoning over math, science, and competition-style problems
- Self-hosted or private deployment where open weights are required
- Long-document analysis and codebase understanding using the 256K context
How to access
| Provider | Model ID |
|---|---|
| Moonshot AI (Kimi API) ↗ | kimi-k2-thinking |
| OpenRouter ↗ | moonshotai/kimi-k2-thinking |
| Together AI ↗ | kimi-k2-thinking |
| Amazon Bedrock ↗ | kimi-k2-thinking |
FAQ
Is Kimi K2 Thinking open source?
Yes. Moonshot AI released Kimi K2 Thinking with open weights on Hugging Face under a Modified MIT License, so you can download and self-host it. The license adds an attribution requirement for very large-scale commercial deployments.
What makes Kimi K2 Thinking different from the original Kimi K2?
Kimi K2 Thinking is an explicit reasoning variant. Instead of answering directly, it produces extended chain-of-thought and interleaves tool calls, sustaining 200 to 300 sequential tool calls across a single task. It shares the trillion-parameter MoE architecture but is tuned for long-horizon, agentic problem solving.
How big is Kimi K2 Thinking and what context window does it support?
It is a Mixture-of-Experts model with 1 trillion total parameters and 32 billion activated per token, and it supports a 256K-token context window. It ships with native INT4 weights via Quantization-Aware Training for faster, cheaper inference.
How much does the Kimi K2 Thinking API cost?
Listed pricing is about $0.60 per million input tokens and $2.50 per million output tokens (per OpenRouter and Artificial Analysis). Because the weights are open, you can also run it on your own hardware instead of paying per token.
