AI/TLDR

Kimi K2

Moonshot AI's original open-weight 1T-parameter MoE (32B active) for agentic coding and tool use.

Overview

Kimi K2 is Moonshot AI (Kimi)'s original trillion-parameter open-weight model, released on July 11, 2025. It is a Mixture-of-Experts (MoE) language model with 1 trillion total parameters but only 32 billion activated per token, which keeps inference costs far below a dense model of comparable scale. Moonshot shipped it as two checkpoints under a Modified MIT License: Kimi-K2-Base, the raw pre-trained foundation model, and Kimi-K2-Instruct, post-trained for chat, coding and tool use.

Kimi K2 was positioned as an 'agentic intelligence' model rather than a step-by-step reasoner: it answers directly (no long chain-of-thought) but is tuned to call tools and execute multi-step coding and software-engineering tasks. It was pre-trained on 15.5 trillion tokens with Moonshot's Muon optimizer and uses Multi-head Latent Attention (MLA) and SwiGLU across 61 layers and 384 experts, with a 128K-token context window. It handles text only.

At launch Kimi K2 stood out as one of the strongest open-weight non-reasoning models on coding and agentic benchmarks such as SWE-bench Verified, and its open weights on Hugging Face made it widely self-hostable. It later seeded a fast-moving line (Kimi K2-Instruct-0905, the K2 Thinking reasoning variant, and the K2.5/K2.6 releases). The original 0711 checkpoint reached end-of-life on Moonshot's first-party API on May 25, 2026, but the weights remain freely available and the model is still served by third-party providers.

Released2025-07-11
LicenseModified MIT License
WeightsOpen weights
Parameters1T total / 32B active (MoE)
Context128K
Max output128K
ArchitectureMixture-of-Experts (MoE) with 1 trillion total parameters and 32 billion activated per token. 61 layers (1 dense), 384 routed experts with 8 selected per token plus 1 shared expert, 64 attention heads, 160K vocabulary, Multi-head Latent Attention (MLA), and SwiGLU activation. Pre-trained on 15.5 trillion tokens using the Muon optimizer. Released in two checkpoints: Kimi-K2-Base (foundation) and Kimi-K2-Instruct (post-trained for chat and agentic use).
ModalitiesText
StatusLegacy (open weights available; first-party Kimi/Moonshot API retired 2026-05-25)

Benchmarks

  1. SWE-bench Verified (agentic, single attempt)65.8%
  2. SWE-bench Verified (agentless)51.8%
  3. LiveCodeBench v6 (Pass@1)53.7%
  4. MultiPL-E (Pass@1)85.7%
  5. AIME 2024 (Avg@64)69.6%
  6. AIME 2025 (Avg@64)49.5%
  7. MATH-50097.4%
  8. MMLU (Exact Match)89.5%
  9. MMLU-Pro (Exact Match)81.1%
  10. GPQA-Diamond (Avg@8)75.1%
  11. Tau2 Retail (Avg@4)70.6%
  12. AceBench76.5%

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input$0.55 / 1M tokens per 1M tokens
Cached input$0.15 / 1M tokens per 1M tokens
Output$2.20 / 1M tokens per 1M tokens

Original Moonshot/Kimi launch pricing for kimi-k2-0711-preview (cache-miss input $0.55, cached input $0.15, output $2.20). Third-party listings (e.g. OpenRouter) are broadly similar (~$0.57 input / ~$2.30 output). The first-party Kimi/Moonshot API retired the 0711 checkpoint on 2026-05-25; because the weights are open, it can also be self-hosted or run via third-party providers.

Pricing source ↗

Strengths

  • Open weights under a permissive Modified MIT License, fully self-hostable
  • Strong agentic coding for its release window (65.8% on SWE-bench Verified, agentic)
  • Trillion-parameter quality at 32B active-parameter inference cost (sparse MoE)
  • Excellent math accuracy (97.4% on MATH-500)
  • 128K-token context for large codebases and long documents
  • Tool-use / agent oriented post-training in the Instruct checkpoint

Best for

  • Agentic coding assistants and automated software-engineering workflows
  • Self-hosted or private deployment where open weights are required
  • Tool-calling agents that orchestrate multi-step tasks
  • Math and quantitative reasoning over text
  • Long-document and large-codebase analysis using the 128K context
  • Cost-sensitive high-volume inference via the sparse MoE design

How to access

ProviderModel ID
OpenRouter ↗moonshotai/kimi-k2

Kimi K2 — every version

The full lineage of the Kimi K2 line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

VersionReleasedContextLicense
Kimi K2.7-Codecurrent2026-06-12256KModified MIT
Kimi K2.62026-04-20Open weights
Kimi K2.52026-01-27Open weights
Kimi K2-Instruct-09052025-09-09Open weights
Kimi K22025-07-11MIT

FAQ

Is Kimi K2 open source?

Yes. Moonshot AI released Kimi K2 with open weights on Hugging Face under a Modified MIT License, in two checkpoints: Kimi-K2-Base (the pre-trained foundation model) and Kimi-K2-Instruct (post-trained for chat and agentic use). You can download and self-host either one.

How big is Kimi K2 and what context window does it support?

Kimi K2 is a Mixture-of-Experts model with 1 trillion total parameters but only 32 billion activated per token, so inference is far cheaper than a dense model of similar size. It supports a 128K-token context window and handles text only.

How is Kimi K2 different from Kimi K2 Thinking?

The original Kimi K2 (0711) is a non-reasoning, agentic model: it answers directly and is tuned for tool use and coding. Kimi K2 Thinking, released later, is an explicit reasoning variant that produces extended chain-of-thought and chains hundreds of sequential tool calls. They share the trillion-parameter MoE architecture.

Can I still use Kimi K2 0711?

Moonshot retired the original 0711 checkpoint on its first-party Kimi/Moonshot API on May 25, 2026, recommending newer models like K2.5 or K2.6. However, because Kimi K2's weights are open, you can still self-host it or access it through third-party inference providers such as OpenRouter.