Kimi-VL-A3B-Thinking-2506

Moonshot AI's open-weight MoE vision-language reasoner — 16B total, 2.8B active.

Overview

Kimi-VL-A3B-Thinking-2506 is the current flagship of Moonshot AI's open-source Kimi-VL line, released on 21 June 2025. It is a Mixture-of-Experts vision-language model that carries 16B total parameters but activates only about 2.8B per token (the 'A3B' in the name), pairing the MoonViT native-resolution vision encoder with a Moonlight-16B-A3B MoE language decoder. The whole model is released under the permissive MIT license, so the weights can be downloaded, fine-tuned and self-hosted freely.

Unlike the original Kimi-VL-A3B-Thinking, the 2506 revision is a single model that is strong at both step-by-step reasoning and plain visual perception. Moonshot reports gains across multimodal-reasoning benchmarks (for example +20.1 points on MathVision and +8.4 on MathVista versus the first release) while using roughly 20% shorter thinking traces on average, and it matches the non-thinking Kimi-VL-A3B-Instruct on general perception tasks like MMBench, MMStar and RealWorldQA.

The model reads text, images, video and multi-page PDFs, supports a 128K-token context window, and handles high-resolution inputs of up to 3.2 million pixels (1792x1792) per image with up to 256 images per prompt. It can emit up to 32K output tokens, wrapping its reasoning in think tags. That combination makes it well suited to long-document understanding, video question answering, and GUI/OS-agent grounding tasks while staying cheap to run thanks to the sparse MoE design.

Released	2025-06-21
License	MIT
Weights	Open weights
Parameters	16B total / 2.8B active (MoE)
Context	128K
Max output	32K
Architecture	Mixture-of-Experts vision-language model. A native-resolution vision encoder (MoonViT) feeds an MoE language decoder based on Moonlight-16B-A3B: 16B total parameters with only ~2.8B activated per token. The 2506 update adds higher-resolution image support (up to 3.2M pixels / 1792x1792 per image, 256 images per prompt) and is tuned to reach answers with about 20% shorter chain-of-thought than the original Kimi-VL-A3B-Thinking.
Knowledge cutoff	December 2024
Modalities	Text, Vision, Video, PDF
Status	Available

Benchmarks

Scores on a 0–100 scale (25-point gridlines); higher is better. Each benchmark links to its published source.

Pricing

Input	$0.025 / 1M tokens per 1M tokens
Output	$0.10 / 1M tokens per 1M tokens

Moonshot AI does not list this model on its own platform; weights are MIT-licensed and self-hostable. The figures above are the paid OpenRouter endpoint; a free OpenRouter endpoint (kimi-vl-a3b-thinking:free) is also available.

Pricing source ↗

Strengths

Open weights under a permissive MIT license — free to download, fine-tune and self-host
Efficient sparse MoE: 16B total parameters but only ~2.8B activated per token
Strong multimodal math and reasoning (MathVision 56.9, MathVista 80.1) with ~20% shorter thinking traces than the prior release
High-resolution vision via MoonViT — up to 3.2M pixels (1792x1792) per image, 256 images per prompt
Open-source state of the art on VideoMMMU (65.2) for video reasoning
Strong GUI / OS-agent grounding (ScreenSpot-Pro 52.8, OSWorld-G 52.5, V* 83.2)
128K-token context for long documents and multi-image or PDF inputs

Best for

Visual and multimodal math/reasoning over charts, diagrams and screenshots
Long-document and multi-page PDF understanding within a 128K context
Video question answering and video reasoning
GUI / OS-agent automation that grounds clicks on high-resolution UI screenshots
OCR and high-resolution image analysis
Self-hosted multimodal deployments where open weights and low activated-parameter cost matter

How to access

Provider	Model ID
OpenRouter ↗	`moonshotai/kimi-vl-a3b-thinking`
Hugging Face (weights) ↗	`moonshotai/Kimi-VL-A3B-Thinking-2506`

Kimi-VL — every version

The full lineage of the Kimi-VL line, newest first. Every version has its own page — click any to compare specs, benchmarks and pricing.

Version	Released	Context	License
Kimi-VL-A3B-Thinking-2506current	2025-06-21	—	Open weights
Kimi-VL-A3B-Thinking	2025-04	—	Open weights
Kimi-VL-A3B-Instruct	2025-04	—	Open weights

FAQ

What is Kimi-VL-A3B-Thinking-2506?

It is the current flagship of Moonshot AI's open-source Kimi-VL line, released on 21 June 2025. It is a Mixture-of-Experts vision-language model with 16B total parameters but only about 2.8B activated per token, combining the MoonViT vision encoder with a Moonlight-16B-A3B language decoder for multimodal reasoning over text, images, video and PDFs.

Is Kimi-VL-A3B-Thinking-2506 open source?

Yes. The weights are released on Hugging Face under the permissive MIT license, so you can download, fine-tune and self-host the model freely.

What context length and resolution does it support?

It supports a 128K-token (131,072) context window and high-resolution images of up to 3.2 million pixels (1792x1792) per image, with up to 256 images per prompt, and can output up to 32K tokens.

How much does it cost to use via an API?

Moonshot does not list this model on its own platform, but it is served on OpenRouter at roughly $0.025 per million input tokens and $0.10 per million output tokens, with a free endpoint also available. Because the weights are MIT-licensed, you can also run it yourself at no per-token cost.

// Overview

// Benchmarks

// Pricing

// Strengths

// Best for

// How to access

// Kimi-VL — every version

// FAQ