Kimi K2

Name: Kimi K2
Author: Moonshot AI (Kimi)

Moonshot AI's original open-weight 1T-parameter MoE (32B active) for agentic coding and tool use.

Overview

Kimi K2 is Moonshot AI (Kimi)'s original trillion-parameter open-weight model, released on July 11, 2025. It is a Mixture-of-Experts (MoE) language model with 1 trillion total parameters but only 32 billion activated per token, which keeps inference costs far below a dense model of comparable scale. Moonshot shipped it as two checkpoints under a Modified MIT License: Kimi-K2-Base, the raw pre-trained foundation model, and Kimi-K2-Instruct, post-trained for chat, coding and tool use.

Kimi K2 was positioned as an 'agentic intelligence' model rather than a step-by-step reasoner: it answers directly (no long chain-of-thought) but is tuned to call tools and execute multi-step coding and software-engineering tasks. It was pre-trained on 15.5 trillion tokens with Moonshot's Muon optimizer and uses Multi-head Latent Attention (MLA) and SwiGLU across 61 layers and 384 experts, with a 128K-token context window. It handles text only.

At launch Kimi K2 stood out as one of the strongest open-weight non-reasoning models on coding and agentic benchmarks such as SWE-bench Verified, and its open weights on Hugging Face made it widely self-hostable. It later seeded a fast-moving line (Kimi K2-Instruct-0905, the K2 Thinking reasoning variant, and the K2.5/K2.6 releases). The original 0711 checkpoint reached end-of-life on Moonshot's first-party API on May 25, 2026, but the weights remain freely available and the model is still served by third-party providers.

Released	2025-07-11
License	Modified MIT License
Weights	Open weights
Parameters	1T total / 32B active (MoE)
Context	128K
Max output	128K
Architecture	Mixture-of-Experts (MoE) with 1 trillion total parameters and 32 billion activated per token. 61 layers (1 dense), 384 routed experts with 8 selected per token plus 1 shared expert, 64 attention heads, 160K vocabulary, Multi-head Latent Attention (MLA), and SwiGLU activation. Pre-trained on 15.5 trillion tokens using the Muon optimizer. Released in two checkpoints: Kimi-K2-Base (foundation) and Kimi-K2-Instruct (post-trained for chat and agentic use).
Modalities	Text
Status	Legacy (open weights available; first-party Kimi/Moonshot API retired 2026-05-25)

Benchmarks

SWE-bench Verified (agentic, single attempt)65.8%
SWE-bench Verified (agentless)51.8%
LiveCodeBench v6 (Pass@1)53.7%
MultiPL-E (Pass@1)85.7%
AIME 2024 (Avg@64)69.6%
AIME 2025 (Avg@64)49.5%
MATH-50097.4%
MMLU (Exact Match)89.5%
MMLU-Pro (Exact Match)81.1%
GPQA-Diamond (Avg@8)75.1%
Tau2 Retail (Avg@4)70.6%
AceBench76.5%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.55 / 1M tokens per 1M tokens
Cached input	$0.15 / 1M tokens per 1M tokens
Output	$2.20 / 1M tokens per 1M tokens

Original Moonshot/Kimi launch pricing for kimi-k2-0711-preview (cache-miss input $0.55, cached input $0.15, output $2.20). Third-party listings (e.g. OpenRouter) are broadly similar (~$0.57 input / ~$2.30 output). The first-party Kimi/Moonshot API retired the 0711 checkpoint on 2026-05-25; because the weights are open, it can also be self-hosted or run via third-party providers.

Pricing source ↗

Strengths

Open weights under a permissive Modified MIT License, fully self-hostable
Strong agentic coding for its release window (65.8% on SWE-bench Verified, agentic)
Trillion-parameter quality at 32B active-parameter inference cost (sparse MoE)
Excellent math accuracy (97.4% on MATH-500)
128K-token context for large codebases and long documents
Tool-use / agent oriented post-training in the Instruct checkpoint

Best for

Agentic coding assistants and automated software-engineering workflows
Self-hosted or private deployment where open weights are required
Tool-calling agents that orchestrate multi-step tasks
Math and quantitative reasoning over text
Long-document and large-codebase analysis using the 128K context
Cost-sensitive high-volume inference via the sparse MoE design

How to access

Provider	Model ID
OpenRouter ↗	`moonshotai/kimi-k2`

Kimi K2 — every version

The full lineage of the Kimi K2 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Kimi K2.7-Codecurrent	2026-06-12	256K	Modified MIT
Kimi K2.6	2026-04-20	—	Open weights
Kimi K2.5	2026-01-27	—	Open weights
Kimi K2-Instruct-0905	2025-09-09	—	Open weights
Kimi K2	2025-07-11	—	MIT

FAQ

Is Kimi K2 open source?

Yes. Moonshot AI released Kimi K2 with open weights on Hugging Face under a Modified MIT License, in two checkpoints: Kimi-K2-Base (the pre-trained foundation model) and Kimi-K2-Instruct (post-trained for chat and agentic use). You can download and self-host either one.

How big is Kimi K2 and what context window does it support?

Kimi K2 is a Mixture-of-Experts model with 1 trillion total parameters but only 32 billion activated per token, so inference is far cheaper than a dense model of similar size. It supports a 128K-token context window and handles text only.

How is Kimi K2 different from Kimi K2 Thinking?

The original Kimi K2 (0711) is a non-reasoning, agentic model: it answers directly and is tuned for tool use and coding. Kimi K2 Thinking, released later, is an explicit reasoning variant that produces extended chain-of-thought and chains hundreds of sequential tool calls. They share the trillion-parameter MoE architecture.

Can I still use Kimi K2 0711?

Moonshot retired the original 0711 checkpoint on its first-party Kimi/Moonshot API on May 25, 2026, recommending newer models like K2.5 or K2.6. However, because Kimi K2's weights are open, you can still self-host it or access it through third-party inference providers such as OpenRouter.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Kimi K2 — every version

// FAQ