New AI Model Releases — LLMs & Open Weights

Every new AI model worth knowing — frontier and open-weight releases, explained in plain English with the context window, parameters and benchmarks that matter.

120 releases tracked

MolmoMotion — Ai2's language-guided 3D motion forecasting modelsAllen Institute for AI · 2026-06-17 · major
MolmoMotion predicts how points on objects move in 3D from a video frame and a text instruction, with weights, a 1.16M-video dataset, and a benchmark.
Qwen-Robot Suite — Alibaba's three foundation models for robotsAlibaba Qwen · 2026-06-16 · major
Three open foundation models from Alibaba's Qwen team that move robots, drive them around, and predict what happens next.
VibeThinker-3B — Weibo's 3B reasoning model hits 80.2% on LiveCodeBench v6Weibo AI · 2026-06-16 · notable
Sina Weibo's 3B model finetuned from Qwen2.5-Coder-3B, MIT-licensed, scoring 94.3 on AIME26 and 80.2 on LiveCodeBench v6.
Grok Imagine Video 1.5 — xAI's image-to-video model goes GA at $0.14/sec 720pxAI · 2026-06-16 · major
xAI's image-to-video model — the engine behind Grok Imagine's video clips — is now generally available as a pay-per-second API.
GLM-5.2 — Z.ai's new flagship coding model with 1M contextZ.ai · 2026-06-13 · major
Z.ai's new coding flagship lands first inside the GLM Coding Plan, with API, chatbot, and open weights set for next week.
Kimi K2.7-Code — Moonshot's 1T MoE Coding Model Beats Claude Opus on MCPMarkMoonshot AI · 2026-06-12 · major
Moonshot's coding-specialized fork of Kimi K2.6, with faster reasoning and a higher MCP tool-use score than Claude Opus 4.8.
Decart Ships Oasis 3 — First API-Accessible Interactive World Model for Physical AI Streams Three Synchronized 768×512 Camera Views at 22 FPS With <200ms End-to-End Latency on NVIDIA HGX B200, Priced at $0.02 per Second of SimulationDecart · 2026-06-10 · major
Decart's Oasis 3 lets robotics and AV teams stream three synchronized photorealistic camera views in real time from a text prompt, action-conditioned, via API.
Amap Open-Sources ABot-Earth 0.5 — Alibaba's 3D Native World Model Generates Kilometer-Scale 3D Gaussian-Splatting City Scenes From a Single Satellite Image or Text Prompt in About 10 Minutes per km² on a Consumer GPUAmap (Alibaba) · 2026-06-08 · major
Alibaba's Amap turns one satellite image into an interactive 3D city in roughly 10 minutes per square kilometer.
Google Ships DiffusionGemma — Apache-2.0 26B/3.8B-Active Mixture-of-Experts That Denoises 256 Tokens in Parallel via Discrete Block Diffusion, Hits 1,000+ Tokens/Sec on H100 and 700+ on RTX 5090 While Posting 77.6% MMLU Pro, 73.2% GPQA Diamond, and 69.1% LiveCodeBench v6Google DeepMind · 2026-06-10 · major
Google opens a 26B-parameter Gemma that denoises 256 tokens at once instead of generating them one by one.
Kuaishou Open-Sources Keye-VL-2.0-30B-A3B — Apache-2.0 Mixture-of-Experts Vision-Language Model Activates 3B Parameters Per Token, Lands 256K Context With DeepSeek Sparse Attention for Lossless Long-Video Reasoning, Beats Qwen3-VL-235B on LongVideoBench at 74.1, and Tops 70.1 mIoU on QVHighlights-TimeLensKwai Keye · 2026-06-09 · notable
A 30B/3B-active MoE that pushes long-video understanding past dense 235B baselines, all under Apache-2.0.
Cohere Ships North Mini Code 1.0 — Apache-2.0 Sparse 30B/3B-Active Mixture-of-Experts Coding Model With a 256K Context, 64K Max Output, 83.2% Pass@1 on SWE-Bench Verified, 63% on Terminal-Bench v2, and 2.8× the Output Throughput of Devstral Small 2Cohere · 2026-06-09 · major
Cohere's first model aimed squarely at developers — a 30B/3B-active MoE coding model under Apache 2.0.
Anthropic Ships Claude Fable 5 and Claude Mythos 5 — New Flagship Tops Hebbia's Finance Benchmark and Cognition's FrontierCode Eval at $10/$50 per Million Tokens, With Mythos 5 Restricted to Project Glasswing Partners and Select Biology ResearchersAnthropic · 2026-06-09 · seismic
Fable 5 ships to everyone today; Mythos 5 stays gated to Glasswing partners and biology researchers.
Google DeepMind Ships Gemini 3.5 Live Translate — Audio Model Streams Near Real-Time Speech-to-Speech in 70+ Languages With Preserved Intonation, SynthID Watermarks, and Public Preview on Gemini Live API, AI Studio, Meet, and Google TranslateGoogle DeepMind · 2026-06-09 · major
First production speech-to-speech model that translates continuously instead of taking turns.
Xiaomi Ships MiMo-V2.5-Pro-UltraSpeed — FP4-Quantized 1.02T MoE Hits ~1,200 Tokens/Sec on Stock 8-GPU Nodes via Block-Diffusion 'DFlash' Speculative Decoding, MIT-Licensed FP4-DFlash Checkpoint Lands on Hugging FaceXiaomi · 2026-06-08 · major
An FP4 backbone plus a block-diffusion drafter pushes Xiaomi's 1T MiMo MoE past 1,000 tokens per second on a stock 8-GPU node.
NVIDIA Nemotron 3.5 ASR Streaming 0.6B — Open-Weights Multilingual Speech Model Sustains ~17× More Concurrent Streams on H100, Covers 40 Locales From One CheckpointNVIDIA · 2026-06-04 · major
A 600M-param open ASR model that streams 40 locales from one checkpoint and squeezes ~17× more concurrent voice sessions onto an H100.
Google Ships Gemma 4 QAT Checkpoints — New Mobile Quantization Format Brings Gemma 4 E2B Under 1GB of RAM, Q4_0 Weights Land for E2B, E4B, 12B, and the 26B MoE on Hugging FaceGoogle DeepMind · 2026-06-05 · major
Gemma 4 shrinks to under 1GB on phones via a custom 2-bit mobile quantization format trained with QAT.
OpenAI Updates GPT-Rosalind With GPT-5.5 Reasoning — New LifeSciBench Tops Internal Evals, Two Codex Life-Sciences Plugins Land, and Novo Nordisk Joins the Trusted-Access RosterOpenAI · 2026-06-04 · notable
OpenAI's life-sciences specialist gets a GPT-5.5 brain, two Codex plugins for bio workflows, and Novo Nordisk on the partner list.
Magenta RealTime 2 — Google's Open-Weights Live Music Model Lands With ~200ms Control Latency, Native Apple Silicon C++/MLX Inference, and Both 2.4B and 230M Variants on CC BY 4.0Google · 2026-06-04 · major
An open-weights live music model that responds to MIDI, text, and audio in ~200ms and runs entirely on your MacBook.
Ideogram 4.0 — 9.3B Open-Weight Text-to-Image Foundation Model Lands With Apache-2.0 Inference Code, Native 2K Resolution, and Structured JSON Prompting That Tops DesignArena's Open-Weight LeaderboardIdeogram · 2026-06-03 · major
Ideogram's first open-weight image model — 9.3B params, native 2K output, JSON-controlled layouts, and SOTA text rendering for design.
Gemma 4 12B — Google's First Unified, Encoder-Free Multimodal Decoder Streams Audio, Video, and Image Patches Straight Into an 11.95B Apache-2.0 LLM With 256K ContextGoogle · 2026-06-03 · major
A 12B open multimodal model that lets raw audio and video patches skip the encoder and hit the LLM directly.
H Company Open-Sources Holo3.1 — Computer-Use VLM Family at 0.8B, 4B, 9B, and 35B-A3B Hits 74.2% OSWorld, Jumps AndroidWorld From 67% to 79.3%, and Ships FP8, NVFP4, and Q4 GGUF Checkpoints Under Apache 2.0H Company · 2026-06-02 · major
An Apache 2.0 computer-use VLM family that runs locally on consumer GPUs and now drives mobile as well as desktop.
NVIDIA Nemotron 3 Ultra — 550B Mamba-Transformer Mixture-of-Experts With 55B Active Parameters Tops US Open-Weights Leaderboard at 48 on Artificial Analysis Intelligence Index, Ships June 4 on Hugging Face, OpenRouter, ModelScope, and build.nvidia.comNVIDIA · 2026-06-01 · major
NVIDIA's biggest open-weights model: a 550B Mamba-Transformer MoE that beats every other US open model on intelligence and inference speed.
WindBorne WeatherMesh-6 — AI Weather Model Cuts Ensemble-Mean RMSE Up to 38% vs ECMWF, Produces Hourly 3 km Forecasts From a 128-Member Ensemble in Latent SpaceWindBorne Systems · 2026-06-01 · major
An AI weather model that beats ECMWF's flagship physics forecast by up to 38% on ensemble RMSE — and refreshes hourly.
Microsoft Aion 1.0 — On-Device Windows AI Lineup Debuts at Build 2026 With Aion 1.0 Instruct in Edge Insider Today and a 14B Aion 1.0 Plan Reasoning + Tool-Calling Model Shipping In-BoxMicrosoft · 2026-06-02 · major
Microsoft ships a Windows-native AI model line: a small Instruct SLM for everyday text tasks and a 14B agentic Plan model that runs in-box.
MiniMax M3 — Open-Weight Frontier Coding, 1M-Token Context, and Native Multimodality on a New Sparse-Attention ArchitectureMiniMax · 2026-06-01 · major
M3 packages frontier coding, a million-token context, and native multimodality into one open-weight model.
Microsoft Launches Seven New MAI Models at Build 2026 — MAI-Thinking-1 Reasoning, MAI-Code-1-Flash 5B for Copilot, MAI-Image-2.5 With a Flash Variant, MAI-Voice-2 Across 15 Languages, MAI-Transcribe-1.5 Across 43Microsoft AI · 2026-06-02 · major
Microsoft AI's biggest first-party launch yet: a reasoner, a coding flash model, image + voice updates, and a speech-to-text engine claimed to be 5x faster than competing models.
NVIDIA Cosmos 3 — Open Mixture-of-Transformers Omni-Model for Physical AI Lands at Computex With Nano (16B) and Super (65B) Variants, OpenMDW 1.1 Weights, and a New Coalition With Runway, Black Forest Labs, and Skild AINVIDIA · 2026-05-31 · major
An open frontier omni-model that can reason about, simulate, and act in the physical world.
JetBrains Mellum2 — 12B Mixture-of-Experts With 2.5B Active Parameters Opens Up Under Apache 2.0, Ships Base, Instruct, and Thinking Variants Trained on ~10.6T TokensJetBrains · 2026-06-01 · notable
A fast, open 12B MoE designed for routing, RAG, and sub-agents inside coding workflows.
Grok Build 0.1 Lands on the xAI API in Public Beta — 256K Context, $1/$2 per 1M Tokens, 100+ Tokens/Sec for Agentic CodingxAI · 2026-05-28 · major
xAI's coding-focused model goes from CLI-only to a public-beta API at $1 in, $2 out per million tokens.
Qwen Ships Qwen-VLA — Unified Vision-Language-Action Generalist Built on Qwen3.5-4B With a 1.15B DiT Flow-Matching Action Decoder, 97.9% on LIBERO and 76.9% Real-World ALOHAQwen · 2026-05-28 · major
Qwen's vision-language stack now talks to robots end-to-end.
Liquid AI Ships LFM2.5-8B-A1B — On-Device MoE With 1B Active Parameters, 38T Training Tokens, 128K Context, and Explicit Chain-of-ThoughtLiquid AI · 2026-05-28 · major
8B-total / 1B-active mixture-of-experts trained on 38 trillion tokens, tuned for on-device agentic work with explicit chain-of-thought.
Stepfun Releases Step 3.7 Flash — 198B Vision-Language MoE With 11B Active Parameters, 56.3% SWE-Bench Pro, 256K Context, and Three Reasoning LevelsStepfun · 2026-05-29 · major
198B-parameter vision-language MoE tuned for high-frequency agentic workloads — Apache 2.0, 256K context, and three reasoning levels you can dial up or down.
NVIDIA LocateAnything-3B — Parallel Box Decoding Vision-Language Grounder Hits 12.7 Boxes/Sec on H100, ~10× Faster Than Qwen3-VL, Trained on 138M Queries Across 785M BoxesNVIDIA · 2026-05-26 · major
LocateAnything decodes full bounding boxes in one shot, getting NVIDIA a vision-language grounder that is roughly 10× faster than Qwen3-VL.
Claude Opus 4.8Anthropic · 2026-05-28 · seismic
Anthropic's flagship gets sharper agentic judgment, honest error-flagging, and a parallel-subagent dynamic-workflow tool.
LLaVA-OneVision-2 — Open 8B Vision-Language Model Reads Video as a Codec Bit-Cost Stream, Hits 74.9 JumpScore mAP vs Qwen3-VL-8B's 30.1LMMs-Lab · 2026-05-25 · notable
An open 8B vision-language model that reads video as a compression stream, not a stack of sampled frames.
ElevenLabs Music v2 — New Music Model Switches Genres Mid-Track, Adds Section-Level Inpainting, and Cuts API Prices Up to 50%ElevenLabs · 2026-05-26 · major
A music model that swaps genres mid-track and regenerates just one section of a song.
OpenBMB MiniCPM5-1B — 1.08B On-Device Model Reaches Open-Source SOTA in Its Size Class, Shipping GGUF and 4-Bit MLX BuildsOpenBMB · 2026-05-25 · notable
A 1.08B on-device language model that reaches open-source SOTA in its size class.
DeepSeek Makes Its 75% V4-Pro API Discount Permanent — Input Stays at $0.435 per Million Tokens, Output at $0.87, a Quarter of the Original Sticker RateDeepSeek · 2026-05-22 · major
DeepSeek makes its 75%-off V4-Pro API pricing permanent instead of letting the promo expire.
Tencent Hy-MT2 — Open-Weight Multilingual Translation Family (1.8B / 7B / 30B-A3B MoE) Covers 33 Languages; 1.8B Quantizes to 440MB at 1.25-bitTencent Hunyuan · 2026-05-21 · notable
An open-weight family of Tencent translation models spanning 1.8B to a 30B MoE, covering 33 languages.
Cohere Command A+ — 218B Sparse MoE With 25B Active Params, Apache 2.0, Runs Agentic and Multimodal Workloads on Two H100sCohere · 2026-05-20 · major
An open-weight 218B MoE that runs agentic, multimodal, multilingual workloads on as few as two H100 GPUs.
OpenAI Model Disproves Erdős's 1946 Unit-Distance Conjecture — First Time AI Has Autonomously Solved a Prominent Open Problem Central to a Field of MathematicsOpenAI · 2026-05-20 · seismic
OpenAI says an internal reasoning model autonomously disproved Paul Erdős's 1946 unit-distance conjecture, finding a new construction that beats the previously-believed best bound.
Project Genie Plugs Into Street View — Google's World Model Generates Interactive Stylised Scenes Anchored to Real US Locations, Rolling Out to AI Ultra Subscribers GloballyGoogle DeepMind · 2026-05-19 · major
Google's Genie world model now takes a real Street View address as its scene seed — pick a styled lens, walk around a generated version of that exact block.
Stable Audio 3.0 — Stability AI Ships a Four-Model Audio Family With 6:20 Music Generation and Open Weights for Three SizesStability AI · 2026-05-20 · major
A four-model audio family from Stability AI with three open-weight checkpoints and full 6:20 song generation in the larger sizes.
Qwen3.7-Max — Alibaba's New Agentic Flagship Runs 35-Hour Autonomous Tasks With 1,000+ Tool CallsQwen (Alibaba) · 2026-05-20 · major
Alibaba's new flagship Qwen3.7-Max is built to run agents for tens of hours and thousands of tool calls without stalling.
Gemini Omni — Google's 'Create Anything From Any Input' Model Ships With Conversational Video GenerationGoogle · 2026-05-19 · major
A single Gemini model that turns text, images, or audio into edited video on request.
Gemini 3.5 Flash — Google's I/O 2026 Model Rivals Flagship Frontier Models on Coding and Agentic Tasks at 4x the SpeedGoogle DeepMind · 2026-05-19 · seismic
Google's new Flash-tier model targets agentic coding at flagship quality and 4x the throughput.
Cursor Composer 2.5 — Coding Model Matches Opus 4.7 and GPT-5.5 on SWE-Bench Multilingual at a Fraction of the CostCursor · 2026-05-18 · major
Cursor's in-house coding model now matches frontier models on coding benchmarks at a fraction of the price.
SANA-WM — NVIDIA's 2.6B Open-Source World Model Generates 720p One-Minute Video With 6-DoF Camera ControlNVIDIA · 2026-05-14 · major
An efficient open-source world model that generates minute-long, camera-controllable 720p video.
SU-01 — Shanghai AI Lab's 31B Open-Weight Reasoner Hits Gold-Medal Scores on IMO 2025 and USAMO 2026Shanghai AI Laboratory · 2026-05-13 · major
A 31B open-weight model that reaches gold-medal olympiad scores from a documented post-training recipe.
Interfaze — Hybrid CNN/DNN + LLM Architecture Beats Gemini-3-Flash, Claude-Sonnet-4.6, GPT-5.4-Mini, Grok-4.3 Across 9 OCR, Vision, STT, and Structured-Output BenchmarksInterfaze · 2026-05-11 · notable
A hybrid model that bolts CNNs and small task-specific networks under an LLM front-end and routes deterministic developer jobs to whichever specialized model is best.
Cactus Compute Needle — 26M-Parameter Function-Calling Model Distilled From Gemini 3.1, MIT-Licensed, 6000 tok/s on Cactus RuntimeCactus Compute · 2026-05-12 · notable
A 26M-parameter open model distilled from Gemini 3.1 that does nothing but call tools — small enough to run on phones, watches, and glasses.
OpenBMB MiniCPM-V 4.6 — 1.3B Vision-Language Model Runs on iOS, Android, and HarmonyOS, Beating Qwen 3.5-0.8B at 19× Fewer TokensOpenBMB · 2026-05-11 · major
Pocket-sized vision-language model — 1.3B params, runs on phones, beats larger models per token spent.
Thinking Machines TML-Interaction-Small — 276B-A12B Native Interaction Model Listens, Talks, and Watches Concurrently at 0.40s Turn-Taking LatencyThinking Machines Lab · 2026-05-11 · major
A 276B mixture-of-experts that holds a real-time conversation while a separate background model does the slow reasoning.
Ai2 EMO — 14B MoE That Keeps Near-Full Accuracy When Run on Just 12.5% of Its ExpertsAllen Institute for AI · 2026-05-08 · notable
An MoE where you can keep just 16 of 128 experts and still have a working model.
HiDream-O1-Image — 8B Pixel-Level Unified Transformer Open-Sources With #8 Spot on Artificial Analysis Text-to-Image ArenaHiDream-ai · 2026-05-08 · notable
HiDream open-sources an 8B pixel-level transformer for image generation that beats larger DiTs and lands #8 on the Artificial Analysis arena.
Tether QVAC MedPsy — On-Device Medical LLM Beats Google MedGemma 27B at Less Than One-Sixteenth the SizeTether · 2026-05-07 · notable
A 4B medical model that fits in 2.6 GB and beats Google's 27B MedGemma on hard clinical reasoning — Apache 2.0, runs entirely offline.
OpenAI GPT-5.5-Cyber — Security-Tuned Model in Limited Preview for Vetted DefendersOpenAI · 2026-05-07 · major
OpenAI ships GPT-5.5-Cyber: a security-tuned preview model gated behind a Trusted Access program for verified defenders.
xAI Ships Grok Imagine Quality Mode to the API — Photorealistic Image Generation Starting at $0.01/ImagexAI · 2026-05-07 · major
Grok's photorealistic image model — the engine behind 300M+ Grok generations — is now an API for any developer.
OpenAI GPT-Realtime-2 — Voice Model With GPT-5-Class Reasoning, Plus Live Translate and Whisper Streaming in the APIOpenAI · 2026-05-07 · major
OpenAI ships three new Realtime API models — a reasoning-grade voice model, live 70→13 language translation, and streaming Whisper transcription.
Zyphra ZAYA1-8B — AMD-Trained MoE Reasoning Model With <1B Active ParametersZyphra · 2026-05-06 · major
Zyphra trained an 8.4B sparse MoE with under a billion active params on AMD MI300X — and it matches open models 10–100x larger on math and code.
Phi-4-Reasoning-Vision-15B — Microsoft's Compact Multimodal Reasoner: 88.2% ScreenSpot v2, Open-WeightMicrosoft Research · 2026-04-10 · notable
Microsoft's 15B open-weight model combines visual reasoning with math and GUI understanding — and knows when NOT to think.
Genesis AI GENE-26.5 — Foundation Model + Human-Scale Hand for Cooking, Piano, and Rubik's CubesGenesis AI · 2026-05-06 · major
An AI 'brain' for robot hands trained on human-scale data, demoed cooking a 20-step meal and solving a Rubik's cube.
MolmoAct 2 — Ai2's Fully Open Bimanual Robotics Model With 720h Open DatasetAllen Institute for AI · 2026-05-05 · major
An open VLA model from Ai2 that runs two-armed robots in the real world — weights, code and 720h of bimanual data on day one.
GPT-5.5 Instant — New ChatGPT Default Cuts Hallucinations 52.5% on Medicine, Law, Finance PromptsOpenAI · 2026-05-05 · major
OpenAI's new default ChatGPT model: more factual, more concise, less emoji noise.
Gemma 4 MTP Drafters — Open-Weight Speculative Decoding Boosts Inference Up To 3xGoogle · 2026-05-05 · major
Tiny companion models pair with Gemma 4 to deliver up to 3x faster inference at identical output quality.
Grok 4.3 — xAI's New Reasoning Flagship at $1.25/$2.50 per 1M Tokens, 1M-Token ContextxAI · 2026-04-30 · major
xAI's new flagship reasoning model lands with a 1M-token window, image input, and lower per-token pricing than Grok 4.20.
ERNIE 5.1 Preview — Baidu's New Flagship Hits #1 Chinese, #13 Global on LMArena TextBaidu · 2026-04-30 · major
Baidu's ERNIE 5.1 Preview is the only Chinese model in LMArena's global top 15 and tops legal/government tasks worldwide.
Tencent Hy-MT1.5 1.25-bit — 440MB On-Device Translator for 33 Languages Beats 72B ModelsTencent · 2026-04-29 · major
A 440MB translation model that runs offline on a phone and beats 72B-parameter open models and commercial APIs on Flores-200.
OpenAI Restricts GPT-5.5 Cyber to 'Trusted Defenders' — Mirroring Anthropic's Mythos Access ModelOpenAI · 2026-04-30 · major
OpenAI's frontier cyber model goes behind a vetted-defender gate, weeks after Sam Altman publicly mocked Anthropic for doing the same with Mythos.
IBM Granite 4.1 — 8B Instruct Matches Granite 4.0's 32B MoE on Tool Calling and Instruction FollowingIBM Research · 2026-04-29 · major
IBM's open-weight enterprise model family steps up: smaller dense models that match the previous generation's 32B mixture-of-experts.
Mistral Medium 3.5 — 128B Open-Weight Flagship Hits 77.6% on SWE-Bench VerifiedMistral AI · 2026-04-29 · major
Mistral's new flagship: 128B dense, 256k context, open weights under modified MIT, and a 77.6 SWE-Bench Verified score.
Talkie-1930 — 13B Open-Weight LLM Trained Only on Pre-1931 English Texttalkie-lm · 2026-04-28 · notable
A 13B LLM with a hard 1930 cutoff: no World War II, no transistors, no internet, trained on OCR'd books and newspapers.
Gemini Robotics-ER 1.6 — Instrument Reading Jumps from 23% to 93% with Agentic VisionGoogle DeepMind · 2026-04-14 · major
Gemini's robotics reasoning model now reads physical gauges with 93% accuracy — up from 23% in the previous version.
Waypoint-1.5 — Interactive 3D World Model at 720p/60fps on Consumer GPUsOverworld · 2026-04-09 · major
A generative world model that renders playable first-person environments at 60fps on a consumer RTX 3090 — no game engine.
OpenAI Privacy Filter — Apache 2.0 PII Detection Model for On-Prem Data PipelinesOpenAI · 2026-04-22 · notable
A small, fast Apache 2.0 model from OpenAI for detecting and masking PII in text — designed to run entirely on-premises.
OmniVoice — Zero-Shot TTS for 600+ Languages at 40× Real-Timek2-fsa · 2026-04-01 · major
Zero-shot TTS covering 600+ languages at 40× real-time speed — Apache 2.0, from the team behind Kaldi and k2.
Kimi K2.6 — Moonshot AI's 1T-Parameter Agentic MoE Opens Under Modified MITMoonshot AI · 2026-04-20 · seismic
Moonshot AI open-sources a 1-trillion-parameter multimodal model built for orchestrating swarms of up to 300 parallel sub-agents.
Xiaomi MiMo-V2.5 Series: Multimodal Agent Model with 42% Token Efficiency EdgeXiaomi · 2026-04-22 · notable
Xiaomi's MiMo-V2.5-Pro matches GPT-5.4 on SWE-bench Pro while using 42% fewer tokens, with native vision, audio, and video in a single 1T MoE model.
Claude Mythos Preview + Project Glasswing: Anthropic's Restricted AI for Zero-Day DiscoveryAnthropic · 2026-04-07 · seismic
Anthropic's Mythos Preview autonomously discovers and exploits zero-days across every major OS and browser — the first frontier model Anthropic deemed too dangerous for public release.
GPT-5.5 — OpenAI's Fully Retrained Flagship: 82.7% Terminal-Bench, 1M ContextOpenAI · 2026-04-23 · seismic
GPT-5.5 is OpenAI's first fully retrained base model since GPT-4.5: 82.7% on Terminal-Bench 2.0, 13 points ahead of Claude Opus 4.7.
DeepSeek V4 — 1.6T Open-Weights MoE Tops LiveCodeBench with MIT LicenseDeepSeek · 2026-04-24 · seismic
DeepSeek's new open-weights flagship: two MIT-licensed MoE models with 1M-token context and top-tier coding performance released today.
Tencent Hy3-Preview — 295B Open-Source MoE with 74.4% SWE-benchTencent · 2026-04-23 · major
Tencent's first open-source flagship under ex-OpenAI researcher Yao Shunyu — 74.4% SWE-bench in a 295B MoE.
Qwen3.6-27B — Flagship-Level Coding in a 27B Dense Open-Weights ModelQwen (Alibaba) · 2026-04-22 · seismic
A 27B dense model scoring 77.2% SWE-bench Verified — Apache 2.0, multimodal, runnable on consumer hardware.
Claude Opus 4.7 — 87.6% SWE-bench, 3.75MP Vision, Task Budgets, at Same $5/$25 PricingAnthropic · 2026-04-16 · seismic
Anthropic's most capable publicly available model: double-digit coding gains, 3× vision resolution, and a new effort system — same price as the model it replaces.
NVIDIA Ising — Open AI Models for Quantum Processor Calibration and Error CorrectionNVIDIA · 2026-04-14 · notable
NVIDIA's Ising family applies AI to two core quantum computing bottlenecks: calibrating noisy qubits and decoding quantum errors in real time.
GPT Image 2 — OpenAI's Reasoning-Augmented Image Model with 2K ResolutionOpenAI · 2026-04-21 · major
OpenAI's new image model thinks before it generates — reasoning and optionally searching the web before producing images.
DenseOn & LateOn — New BEIR Records at 149M Parameters, Fully OpenLightOn AI · 2026-04-21 · notable
Two 149M-parameter retrieval models from LightOn AI set new BEIR records — the smallest ColBERT and dense models to reach these scores.
Ternary Bonsai — 1.58-Bit 8B Runs at 82 tok/s on M4 Pro in 1.75 GBPrismML · 2026-04-16 · notable
1.58-bit ternary weights fit an 8B model in 1.75 GB and run fast enough for interactive use on iPhones and MacBooks.
Qwen3.6-Max-Preview — Alibaba's 1T-Parameter Coding Flagship, #5 on SWE-bench ProQwen (Alibaba) · 2026-04-20 · major
Alibaba's new 1T-parameter MoE flagship scores #5 on SWE-bench Pro Public and leads six coding benchmarks over Qwen3.6-Plus.
GPT-Rosalind — OpenAI's first domain-specific frontier reasoning model, for life sciencesOpenAI · 2026-04-16 · major
OpenAI's first purpose-built domain model — frontier reasoning tuned on molecules, proteins, genes, and experimental workflows.
NVIDIA Nemotron OCR v2 — Unified Multilingual OCR, 28x Faster Than PaddleOCRNVIDIA · 2026-04-17 · notable
A single unified OCR model from NVIDIA that reads six languages at 34.7 pages per second — no per-language models needed.
HY-World 2.0 — Tencent Open-Sources 3D World Model: Image or Text to Navigable 3D SceneTencent Hunyuan · 2026-04-16 · major
Tencent's open 3D world model turns a single image or text prompt into a fully navigable, editable 3D scene — real geometry, not video.
Qwen3.6-35B-A3B — 35B MoE Coding Model, 3B Active Params, SWE-bench 73.4%Alibaba / Qwen · 2026-04-17 · major
Alibaba's 35B MoE model uses only 3B active params, scores 73.4% SWE-bench Verified, and runs locally on a MacBook Pro.
Gemini 3.1 Flash TTS Preview — Google's Inline Audio-Tag Text-to-SpeechGoogle · 2026-04-15 · notable
Google's new TTS API model lets you control speech expression by embedding tags like [whispers] directly in text.
ERNIE-Image — Baidu's Open-Weight Text-to-Image Diffusion TransformerBaidu · 2026-04-15 · notable
Baidu's open-weight 8B diffusion transformer for text-to-image, with a Turbo variant that generates in 8 steps.
Kronos — Foundation Model for Financial K-Line SequencesShiyu Coder · 2026-01-15 · notable
The first foundation model that treats financial candlestick data as a specialized language.
Muse SparkMeta Superintelligence Labs · 2026-04-08 · seismic
Meta's first model after the Alexandr Wang reshuffle — multimodal reasoning that matches Llama 4 Maverick with an order of magnitude less compute.
GLM-5.1Z.ai · 2026-04-07 · major
A 754B open-weight Mixture-of-Experts model from China that beats GPT-5.4 and Claude Opus 4.6 on SWE-Bench Pro — under an MIT license.
Kimi Code K2.6 — Moonshot's Upgraded Coding Agent ModelMoonshot AI · 2026-04-13 · notable
Moonshot's Kimi Code gets a major model upgrade with deeper reasoning and cleaner agent planning — subscribers say it feels like Opus.
Audio Flamingo Next — Open Audio-Language Model with 30-Minute Temporal ReasoningNVIDIA / University of Maryland · 2026-04-13 · notable
NVIDIA and UMD open-source an audio-language model that timestamps its reasoning through 30-minute recordings and beats Gemini 2.5 Pro on MMAU-Pro.
WildDet3D — Promptable monocular 3D detection in the wildAllen Institute for AI · 2026-04-07 · major
Single-image 3D detection that takes text, box or point prompts and works zero-shot across indoor, street and nature scenes.
MiniMax M2.7 — open-weights self-evolving agent modelMiniMax · 2026-04-12 · major
MiniMax's frontier agentic model is now downloadable — a 229B MoE that scores like GPT-5.3-Codex on SWE-Bench Pro.
EXAONE 4.5 — LG's first open-weight vision-language modelLG AI Research · 2026-04-09 · notable
LG's EXAONE family finally goes multimodal — a 33B VLM with 262K context, document-understanding focus, and full inference-stack support on day one.
LPM 1.0 — Large Performance Model for video-based character performanceAnuttacon · 2026-04-09 · major
A 17B video model that makes a still character image act out a full conversation in real time — listening, speaking, reacting, in character.
PersonaPlex — NVIDIA's real-time speech-to-speech model with persona controlNVIDIA · 2026-01-15 · notable
A real-time speech-in/speech-out model from NVIDIA you can prompt into a specific persona and voice, with CPU offload for low-VRAM machines.
LFM2.5-VL-450M — Liquid AI's Tiny Vision-Language Model for EdgeLiquid AI · 2026-04-08 · notable
A 450M-parameter vision-language model that runs on a phone, understands 8 languages, and can locate objects in images.
Trinity-Large-Thinking — Arcee's 400B Open Reasoning Agent ModelArcee AI · 2026-04-01 · major
A 400B sparse model that activates only 13B parameters per token and costs 96% less than Opus for agent tasks.
HappyHorse-1.0 — Alibaba's Video Generation Model Tops Arena RankingsAlibaba · 2026-04-10 · major
Alibaba's stealth video model topped blind rankings under a pseudonym, then revealed itself as the new arena champion.
Harrier — Microsoft's Open-Source Multilingual Embedding Model FamilyMicrosoft · 2026-04-06 · major
Microsoft's Bing team open-sources a decoder-only embedding family that tops the multilingual MTEB-v2 leaderboard.
Gemma 4 — open models for reasoning and agentsGoogle DeepMind · 2026-04-02 · major
Four open models from Google under Apache 2.0 that bring frontier-tier reasoning and native tool-use to consumer GPUs and phones.
Qwen 3.6-Plus — agentic coding flagshipAlibaba / Qwen · 2026-04-02 · major
Alibaba's new flagship language model with a million-token context window and agentic coding that rivals Opus 4.6 at nearly double the speed.
Qwen 3.5-Omni — native omnimodal modelAlibaba / Qwen · 2026-03-30 · major
A single model that natively sees, hears, reads, and speaks — with 113-language speech recognition and real-time voice cloning from a 10-second sample.
MiMo-V2-Pro — Xiaomi's trillion-parameter agentic modelXiaomi · 2026-03-18 · major
A trillion-parameter model from Xiaomi that appeared anonymously on OpenRouter, got mistaken for DeepSeek V4, and turned out to be a genuine frontier contender at $1/M input tokens.
GPT-5.4 mini and nanoOpenAI · 2026-03-17 · notable
OpenAI's smallest GPT-5.4 variants — mini is free-tier fast, nano is API-only cheap, both punch above their weight.
GPT-5.4 — OpenAI's frontier model with native computer useOpenAI · 2026-03-05 · seismic
OpenAI's frontier model with native desktop automation that beats human experts on OSWorld — the first model to do so — plus a million-token context window.
LTX-2.3 — open-source 4K video+audio generationLightricks · 2026-03-05 · major
The first open-source model that generates native 4K video with synchronized audio in one forward pass — under Apache 2.0.
Gemini 3.1 Flash-Lite — cost-efficient multimodal modelGoogle · 2026-03-03 · notable
Google's cheapest and fastest Gemini 3 model — 2.5x faster first-token delivery than Flash, $0.25 per million input tokens, and a million-token context window.
Grok 4.20 Beta 2 — multi-agent reasoning with rapid learningxAI · 2026-03-03 · major
xAI's multi-agent reasoning model that routes queries to four parallel thinkers, updates itself weekly, and holds a 2-million-token context window.
Qwen 3.5 Small — on-device multimodal modelsAlibaba / Qwen · 2026-03-02 · notable
Four natively multimodal models from 0.8B to 9B parameters — small enough for phones, capable enough to beat last-gen 30B models.
HY-Embodied-0.5Tencent Robotics X & HY Vision · 2026-04-09 · major
Open-weights embodied foundation models from Tencent — built to give robot VLA stacks a real spatial-temporal brain.

← All releases · Learn AI