Moonshot AI · 2026-04-20 · notable
Kimi Vendor Verifier — Benchmark Suite to Catch Inference Provider Drift
Moonshot AI open-sourced KVV to verify Kimi K2 inference providers aren't serving degraded or misconfigured models. Runs 5 benchmarks across OCR, vision, tool-call accuracy, and agentic tasks. 262 HN points. MIT.
Open-source benchmark suite that catches when an inference provider is serving a degraded or misconfigured Kimi K2 model.
What is it?
Kimi Vendor Verifier (KVV) is an open-source tool from Moonshot AI that lets users and operators verify whether an inference provider is correctly running Kimi K2 or K2.5 models. Since K2 was released as open weights, multiple cloud providers (Fireworks, DeepInfra, vLLM-based hosts, and others) now offer it — KVV catches cases where a provider is running an over-quantized, misconfigured, or otherwise degraded version without disclosing it.
How does it work?
KVV runs five benchmark passes against any K2-compatible API endpoint: Pre-Verification (checks parameter enforcement like max_tokens and stop sequences), OCRBench (multimodal pipeline), MMMU Pro (vision preprocessing), AIME2025 (long-output stress test), and K2VV ToolCall (JSON schema compliance in tool-calling). Results are compared against Moonshot's reference API. Moonshot also uses KVV to maintain a public vendor leaderboard tracking accuracy across approved providers.
Why does it matter?
HN comments flagged that AWS Bedrock has issues causing 20–30% of tool-call attempts to fail silently on K2 — bugs that surface as unreliable agent behavior rather than clear errors. KVV gives teams a reproducible way to validate any provider before committing to it for agentic workflows. The tool is MIT-licensed and works with any endpoint that implements the K2 API.
Who is it for?
Teams running Kimi K2 workloads via third-party inference providers; operators hosting K2 who want to verify their stack.
Try it
git clone https://github.com/MoonshotAI/Kimi-Vendor-Verifier