Kimi K2 Thinking

Moonshot AI's open-weight trillion-parameter reasoning agent that chains 200-300 tool calls.

Overview

Kimi K2 Thinking is Moonshot AI's flagship open-weight reasoning model, released on November 6, 2025. It extends the Kimi K2 line into an explicit "thinking" agent: a trillion-parameter Mixture-of-Experts model that activates 32 billion parameters per token and interleaves chain-of-thought reasoning with tool use over long horizons.

What sets Kimi K2 Thinking apart is sustained agentic execution. Moonshot reports that it can run 200 to 300 sequential tool calls without human intervention, reasoning coherently across hundreds of steps to research, plan, and solve multi-stage problems. It pairs this with a 256K-token context window and native INT4 quantization (trained via Quantization-Aware Training), which cuts memory and latency without losing accuracy.

Released under a Modified MIT License with full open weights on Hugging Face, Kimi K2 Thinking was the first open model to match or beat leading closed systems on several agentic and reasoning benchmarks at launch. It is served through Moonshot's own Kimi API as well as third-party providers, and is also usable directly in the Kimi chat app.

Released	2025-11-06
License	Modified MIT License
Weights	Open weights
Parameters	1T total / 32B active (MoE)
Context	256K
Max output	256K
Architecture	Mixture-of-Experts (MoE) with 1 trillion total parameters and 32 billion activated per token. 61 layers (1 dense), 384 routed experts with 8 selected per token plus 1 shared expert, 64 attention heads, 7168 attention hidden dimension, 160K vocabulary, Multi-head Latent Attention (MLA), and SwiGLU activation. Ships with native INT4 weights via Quantization-Aware Training (QAT) for roughly 2x faster inference at the same quality.
Modalities	Text
Status	Available

Benchmarks

Official Moonshot AI benchmark comparison for Kimi K2 Thinking versus GPT-5 (High), Claude Sonnet 4.5 (Thinking), Kimi K2 0905, DeepSeek-V3.2, and Grok-4. Values are exactly as published; an asterisk (*) marks scores Moonshot re-tested under their own conditions, and null marks cells with no published score. Showing 20 of 24 published benchmarks.

Benchmark	Kimi K2 Thinking	GPT-5 (High)	Claude Sonnet 4.5 (Thinking)	Kimi K2 0905	DeepSeek-V3.2	Grok-4
Humanity's Last Exam (Text-only), no tools	23.9%	26.3%	19.8*%	7.9%	19.8%	25.4%
Humanity's Last Exam (Text-only), w/ tools	44.9%	41.7%	32.0*%	21.7%	20.3*%	41%
AIME 2025, no tools	94.5%	94.6%	87%	51%	89.3%	91.7%
AIME 2025, w/ python	99.1%	99.6%	100%	75.2%	58.1*%	98.8%
HMMT 2025, no tools	89.4%	93.3%	74.6*%	38.8%	83.6%	90%
HMMT 2025, w/ python	95.1%	96.7%	88.8*%	70.4%	49.5*%	93.9%
IMO-AnswerBench, no tools	78.6%	76.0*%	65.9*%	45.8%	76.0*%	73.1%
GPQA-Diamond, no tools	84.5%	85.7%	83.4%	74.2%	79.9%	87.5%
MMLU-Pro, no tools	84.6%	87.1%	87.5%	81.9%	85%	—
MMLU-Redux, no tools	94.4%	95.3%	95.6%	92.7%	93.7%	—
Longform Writing, no tools	73.8%	71.4%	79.8%	62.8%	72.5%	—
HealthBench, no tools	58%	67.2%	44.2%	43.8%	46.9%	—
BrowseComp, w/ tools	60.2%	54.9%	24.1%	7.4%	40.1%	—
BrowseComp-ZH, w/ tools	62.3%	63*%	42.4*%	22.2%	47.9%	—
Seal-0, w/ tools	56.3%	51.4*%	53.4*%	25.2%	38.5*%	—
FinSearchComp-T3, w/ tools	47.4%	48.5*%	44.0*%	10.4%	27.0*%	—
Frames, w/ tools	87%	86.0*%	85.0*%	58.1%	80.2*%	—
SWE-bench Verified, w/ tools	71.3%	74.9%	77.2%	69.2%	67.8%	—
SWE-bench Multilingual, w/ tools	61.1%	55.3*%	68%	55.9%	57.9%	—
Multi-SWE-bench, w/ tools	41.9%	39.3*%	44.3%	33.5%	30.6%	—

Comparison source ↗

This model's scores

Humanity's Last Exam (with tools)44.9%
Humanity's Last Exam (text-only, no tools)23.9%
BrowseComp60.2%
BrowseComp-ZH62.3%
SWE-bench Verified71.3%
LiveCodeBench v683.1%
AIME 2025 (with Python)99.1%
HMMT 2025 (with Python)95.1%
GPQA84.5%
MMLU-Pro84.6%
Artificial Analysis Intelligence Index33index

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.60 / 1M tokens per 1M tokens
Output	$2.50 / 1M tokens per 1M tokens

Pricing per OpenRouter listing and Artificial Analysis median across providers; the model is also available with open weights for self-hosting.

Pricing source ↗

Strengths

Long-horizon agentic execution: stable across 200-300 sequential tool calls
Strong reasoning with tools (44.9% on Humanity's Last Exam with tools)
State-of-the-art agentic search (60.2% BrowseComp) for an open model
Open weights under a permissive Modified MIT License
256K-token context window for large codebases and documents
Native INT4 quantization for cheaper, faster deployment without quality loss

Best for

Autonomous research agents that browse, gather, and synthesize across many steps
Multi-step coding and software-engineering tasks with tool orchestration
Deep reasoning over math, science, and competition-style problems
Self-hosted or private deployment where open weights are required
Long-document analysis and codebase understanding using the 256K context

How to access

Provider	Model ID
Moonshot AI (Kimi API) ↗	`kimi-k2-thinking`
OpenRouter ↗	`moonshotai/kimi-k2-thinking`
Together AI ↗	`kimi-k2-thinking`
Amazon Bedrock ↗	`kimi-k2-thinking`

FAQ

Is Kimi K2 Thinking open source?

Yes. Moonshot AI released Kimi K2 Thinking with open weights on Hugging Face under a Modified MIT License, so you can download and self-host it. The license adds an attribution requirement for very large-scale commercial deployments.

What makes Kimi K2 Thinking different from the original Kimi K2?

Kimi K2 Thinking is an explicit reasoning variant. Instead of answering directly, it produces extended chain-of-thought and interleaves tool calls, sustaining 200 to 300 sequential tool calls across a single task. It shares the trillion-parameter MoE architecture but is tuned for long-horizon, agentic problem solving.

How big is Kimi K2 Thinking and what context window does it support?

It is a Mixture-of-Experts model with 1 trillion total parameters and 32 billion activated per token, and it supports a 256K-token context window. It ships with native INT4 weights via Quantization-Aware Training for faster, cheaper inference.

How much does the Kimi K2 Thinking API cost?

Listed pricing is about $0.60 per million input tokens and $2.50 per million output tokens (per OpenRouter and Artificial Analysis). Because the weights are open, you can also run it on your own hardware instead of paying per token.

// Overview

// Benchmarks

This model's scores

// Pricing

// Strengths

// Best for

// How to access

// FAQ