New AI Model Releases — LLMs & Open Weights
Every new AI model worth knowing — frontier and open-weight releases, explained in plain English with the context window, parameters and benchmarks that matter.
120 releases tracked
- MolmoMotion — Ai2's language-guided 3D motion forecasting models
MolmoMotion predicts how points on objects move in 3D from a video frame and a text instruction, with weights, a 1.16M-video dataset, and a benchmark.
- Qwen-Robot Suite — Alibaba's three foundation models for robots
Three open foundation models from Alibaba's Qwen team that move robots, drive them around, and predict what happens next.
- VibeThinker-3B — Weibo's 3B reasoning model hits 80.2% on LiveCodeBench v6
Sina Weibo's 3B model finetuned from Qwen2.5-Coder-3B, MIT-licensed, scoring 94.3 on AIME26 and 80.2 on LiveCodeBench v6.
- Grok Imagine Video 1.5 — xAI's image-to-video model goes GA at $0.14/sec 720p
xAI's image-to-video model — the engine behind Grok Imagine's video clips — is now generally available as a pay-per-second API.
- GLM-5.2 — Z.ai's new flagship coding model with 1M context
Z.ai's new coding flagship lands first inside the GLM Coding Plan, with API, chatbot, and open weights set for next week.
- Kimi K2.7-Code — Moonshot's 1T MoE Coding Model Beats Claude Opus on MCPMark
Moonshot's coding-specialized fork of Kimi K2.6, with faster reasoning and a higher MCP tool-use score than Claude Opus 4.8.
- Decart Ships Oasis 3 — First API-Accessible Interactive World Model for Physical AI Streams Three Synchronized 768×512 Camera Views at 22 FPS With <200ms End-to-End Latency on NVIDIA HGX B200, Priced at $0.02 per Second of Simulation
Decart's Oasis 3 lets robotics and AV teams stream three synchronized photorealistic camera views in real time from a text prompt, action-conditioned, via API.
- Amap Open-Sources ABot-Earth 0.5 — Alibaba's 3D Native World Model Generates Kilometer-Scale 3D Gaussian-Splatting City Scenes From a Single Satellite Image or Text Prompt in About 10 Minutes per km² on a Consumer GPU
Alibaba's Amap turns one satellite image into an interactive 3D city in roughly 10 minutes per square kilometer.
- Google Ships DiffusionGemma — Apache-2.0 26B/3.8B-Active Mixture-of-Experts That Denoises 256 Tokens in Parallel via Discrete Block Diffusion, Hits 1,000+ Tokens/Sec on H100 and 700+ on RTX 5090 While Posting 77.6% MMLU Pro, 73.2% GPQA Diamond, and 69.1% LiveCodeBench v6
Google opens a 26B-parameter Gemma that denoises 256 tokens at once instead of generating them one by one.
- Kuaishou Open-Sources Keye-VL-2.0-30B-A3B — Apache-2.0 Mixture-of-Experts Vision-Language Model Activates 3B Parameters Per Token, Lands 256K Context With DeepSeek Sparse Attention for Lossless Long-Video Reasoning, Beats Qwen3-VL-235B on LongVideoBench at 74.1, and Tops 70.1 mIoU on QVHighlights-TimeLens
A 30B/3B-active MoE that pushes long-video understanding past dense 235B baselines, all under Apache-2.0.
- Cohere Ships North Mini Code 1.0 — Apache-2.0 Sparse 30B/3B-Active Mixture-of-Experts Coding Model With a 256K Context, 64K Max Output, 83.2% Pass@1 on SWE-Bench Verified, 63% on Terminal-Bench v2, and 2.8× the Output Throughput of Devstral Small 2
Cohere's first model aimed squarely at developers — a 30B/3B-active MoE coding model under Apache 2.0.
- Anthropic Ships Claude Fable 5 and Claude Mythos 5 — New Flagship Tops Hebbia's Finance Benchmark and Cognition's FrontierCode Eval at $10/$50 per Million Tokens, With Mythos 5 Restricted to Project Glasswing Partners and Select Biology Researchers
Fable 5 ships to everyone today; Mythos 5 stays gated to Glasswing partners and biology researchers.
- Google DeepMind Ships Gemini 3.5 Live Translate — Audio Model Streams Near Real-Time Speech-to-Speech in 70+ Languages With Preserved Intonation, SynthID Watermarks, and Public Preview on Gemini Live API, AI Studio, Meet, and Google Translate
First production speech-to-speech model that translates continuously instead of taking turns.
- Xiaomi Ships MiMo-V2.5-Pro-UltraSpeed — FP4-Quantized 1.02T MoE Hits ~1,200 Tokens/Sec on Stock 8-GPU Nodes via Block-Diffusion 'DFlash' Speculative Decoding, MIT-Licensed FP4-DFlash Checkpoint Lands on Hugging Face
An FP4 backbone plus a block-diffusion drafter pushes Xiaomi's 1T MiMo MoE past 1,000 tokens per second on a stock 8-GPU node.
- NVIDIA Nemotron 3.5 ASR Streaming 0.6B — Open-Weights Multilingual Speech Model Sustains ~17× More Concurrent Streams on H100, Covers 40 Locales From One Checkpoint
A 600M-param open ASR model that streams 40 locales from one checkpoint and squeezes ~17× more concurrent voice sessions onto an H100.
- Google Ships Gemma 4 QAT Checkpoints — New Mobile Quantization Format Brings Gemma 4 E2B Under 1GB of RAM, Q4_0 Weights Land for E2B, E4B, 12B, and the 26B MoE on Hugging Face
Gemma 4 shrinks to under 1GB on phones via a custom 2-bit mobile quantization format trained with QAT.
- OpenAI Updates GPT-Rosalind With GPT-5.5 Reasoning — New LifeSciBench Tops Internal Evals, Two Codex Life-Sciences Plugins Land, and Novo Nordisk Joins the Trusted-Access Roster
OpenAI's life-sciences specialist gets a GPT-5.5 brain, two Codex plugins for bio workflows, and Novo Nordisk on the partner list.
- Magenta RealTime 2 — Google's Open-Weights Live Music Model Lands With ~200ms Control Latency, Native Apple Silicon C++/MLX Inference, and Both 2.4B and 230M Variants on CC BY 4.0
An open-weights live music model that responds to MIDI, text, and audio in ~200ms and runs entirely on your MacBook.
- Ideogram 4.0 — 9.3B Open-Weight Text-to-Image Foundation Model Lands With Apache-2.0 Inference Code, Native 2K Resolution, and Structured JSON Prompting That Tops DesignArena's Open-Weight Leaderboard
Ideogram's first open-weight image model — 9.3B params, native 2K output, JSON-controlled layouts, and SOTA text rendering for design.
- Gemma 4 12B — Google's First Unified, Encoder-Free Multimodal Decoder Streams Audio, Video, and Image Patches Straight Into an 11.95B Apache-2.0 LLM With 256K Context
A 12B open multimodal model that lets raw audio and video patches skip the encoder and hit the LLM directly.
- H Company Open-Sources Holo3.1 — Computer-Use VLM Family at 0.8B, 4B, 9B, and 35B-A3B Hits 74.2% OSWorld, Jumps AndroidWorld From 67% to 79.3%, and Ships FP8, NVFP4, and Q4 GGUF Checkpoints Under Apache 2.0
An Apache 2.0 computer-use VLM family that runs locally on consumer GPUs and now drives mobile as well as desktop.
- NVIDIA Nemotron 3 Ultra — 550B Mamba-Transformer Mixture-of-Experts With 55B Active Parameters Tops US Open-Weights Leaderboard at 48 on Artificial Analysis Intelligence Index, Ships June 4 on Hugging Face, OpenRouter, ModelScope, and build.nvidia.com
NVIDIA's biggest open-weights model: a 550B Mamba-Transformer MoE that beats every other US open model on intelligence and inference speed.
- WindBorne WeatherMesh-6 — AI Weather Model Cuts Ensemble-Mean RMSE Up to 38% vs ECMWF, Produces Hourly 3 km Forecasts From a 128-Member Ensemble in Latent Space
An AI weather model that beats ECMWF's flagship physics forecast by up to 38% on ensemble RMSE — and refreshes hourly.
- Microsoft Aion 1.0 — On-Device Windows AI Lineup Debuts at Build 2026 With Aion 1.0 Instruct in Edge Insider Today and a 14B Aion 1.0 Plan Reasoning + Tool-Calling Model Shipping In-Box
Microsoft ships a Windows-native AI model line: a small Instruct SLM for everyday text tasks and a 14B agentic Plan model that runs in-box.
- MiniMax M3 — Open-Weight Frontier Coding, 1M-Token Context, and Native Multimodality on a New Sparse-Attention Architecture
M3 packages frontier coding, a million-token context, and native multimodality into one open-weight model.
- Microsoft Launches Seven New MAI Models at Build 2026 — MAI-Thinking-1 Reasoning, MAI-Code-1-Flash 5B for Copilot, MAI-Image-2.5 With a Flash Variant, MAI-Voice-2 Across 15 Languages, MAI-Transcribe-1.5 Across 43
Microsoft AI's biggest first-party launch yet: a reasoner, a coding flash model, image + voice updates, and a speech-to-text engine claimed to be 5x faster than competing models.
- NVIDIA Cosmos 3 — Open Mixture-of-Transformers Omni-Model for Physical AI Lands at Computex With Nano (16B) and Super (65B) Variants, OpenMDW 1.1 Weights, and a New Coalition With Runway, Black Forest Labs, and Skild AI
An open frontier omni-model that can reason about, simulate, and act in the physical world.
- JetBrains Mellum2 — 12B Mixture-of-Experts With 2.5B Active Parameters Opens Up Under Apache 2.0, Ships Base, Instruct, and Thinking Variants Trained on ~10.6T Tokens
A fast, open 12B MoE designed for routing, RAG, and sub-agents inside coding workflows.
- Grok Build 0.1 Lands on the xAI API in Public Beta — 256K Context, $1/$2 per 1M Tokens, 100+ Tokens/Sec for Agentic Coding
xAI's coding-focused model goes from CLI-only to a public-beta API at $1 in, $2 out per million tokens.
- Qwen Ships Qwen-VLA — Unified Vision-Language-Action Generalist Built on Qwen3.5-4B With a 1.15B DiT Flow-Matching Action Decoder, 97.9% on LIBERO and 76.9% Real-World ALOHA
Qwen's vision-language stack now talks to robots end-to-end.
- Liquid AI Ships LFM2.5-8B-A1B — On-Device MoE With 1B Active Parameters, 38T Training Tokens, 128K Context, and Explicit Chain-of-Thought
8B-total / 1B-active mixture-of-experts trained on 38 trillion tokens, tuned for on-device agentic work with explicit chain-of-thought.
- Stepfun Releases Step 3.7 Flash — 198B Vision-Language MoE With 11B Active Parameters, 56.3% SWE-Bench Pro, 256K Context, and Three Reasoning Levels
198B-parameter vision-language MoE tuned for high-frequency agentic workloads — Apache 2.0, 256K context, and three reasoning levels you can dial up or down.
- NVIDIA LocateAnything-3B — Parallel Box Decoding Vision-Language Grounder Hits 12.7 Boxes/Sec on H100, ~10× Faster Than Qwen3-VL, Trained on 138M Queries Across 785M Boxes
LocateAnything decodes full bounding boxes in one shot, getting NVIDIA a vision-language grounder that is roughly 10× faster than Qwen3-VL.
- Claude Opus 4.8
Anthropic's flagship gets sharper agentic judgment, honest error-flagging, and a parallel-subagent dynamic-workflow tool.
- LLaVA-OneVision-2 — Open 8B Vision-Language Model Reads Video as a Codec Bit-Cost Stream, Hits 74.9 JumpScore mAP vs Qwen3-VL-8B's 30.1
An open 8B vision-language model that reads video as a compression stream, not a stack of sampled frames.
- ElevenLabs Music v2 — New Music Model Switches Genres Mid-Track, Adds Section-Level Inpainting, and Cuts API Prices Up to 50%
A music model that swaps genres mid-track and regenerates just one section of a song.
- OpenBMB MiniCPM5-1B — 1.08B On-Device Model Reaches Open-Source SOTA in Its Size Class, Shipping GGUF and 4-Bit MLX Builds
A 1.08B on-device language model that reaches open-source SOTA in its size class.
- DeepSeek Makes Its 75% V4-Pro API Discount Permanent — Input Stays at $0.435 per Million Tokens, Output at $0.87, a Quarter of the Original Sticker Rate
DeepSeek makes its 75%-off V4-Pro API pricing permanent instead of letting the promo expire.
- Tencent Hy-MT2 — Open-Weight Multilingual Translation Family (1.8B / 7B / 30B-A3B MoE) Covers 33 Languages; 1.8B Quantizes to 440MB at 1.25-bit
An open-weight family of Tencent translation models spanning 1.8B to a 30B MoE, covering 33 languages.
- Cohere Command A+ — 218B Sparse MoE With 25B Active Params, Apache 2.0, Runs Agentic and Multimodal Workloads on Two H100s
An open-weight 218B MoE that runs agentic, multimodal, multilingual workloads on as few as two H100 GPUs.
- OpenAI Model Disproves Erdős's 1946 Unit-Distance Conjecture — First Time AI Has Autonomously Solved a Prominent Open Problem Central to a Field of Mathematics
OpenAI says an internal reasoning model autonomously disproved Paul Erdős's 1946 unit-distance conjecture, finding a new construction that beats the previously-believed best bound.
- Project Genie Plugs Into Street View — Google's World Model Generates Interactive Stylised Scenes Anchored to Real US Locations, Rolling Out to AI Ultra Subscribers Globally
Google's Genie world model now takes a real Street View address as its scene seed — pick a styled lens, walk around a generated version of that exact block.
- Stable Audio 3.0 — Stability AI Ships a Four-Model Audio Family With 6:20 Music Generation and Open Weights for Three Sizes
A four-model audio family from Stability AI with three open-weight checkpoints and full 6:20 song generation in the larger sizes.
- Qwen3.7-Max — Alibaba's New Agentic Flagship Runs 35-Hour Autonomous Tasks With 1,000+ Tool Calls
Alibaba's new flagship Qwen3.7-Max is built to run agents for tens of hours and thousands of tool calls without stalling.
- Gemini Omni — Google's 'Create Anything From Any Input' Model Ships With Conversational Video Generation
A single Gemini model that turns text, images, or audio into edited video on request.
- Gemini 3.5 Flash — Google's I/O 2026 Model Rivals Flagship Frontier Models on Coding and Agentic Tasks at 4x the Speed
Google's new Flash-tier model targets agentic coding at flagship quality and 4x the throughput.
- Cursor Composer 2.5 — Coding Model Matches Opus 4.7 and GPT-5.5 on SWE-Bench Multilingual at a Fraction of the Cost
Cursor's in-house coding model now matches frontier models on coding benchmarks at a fraction of the price.
- SANA-WM — NVIDIA's 2.6B Open-Source World Model Generates 720p One-Minute Video With 6-DoF Camera Control
An efficient open-source world model that generates minute-long, camera-controllable 720p video.
- SU-01 — Shanghai AI Lab's 31B Open-Weight Reasoner Hits Gold-Medal Scores on IMO 2025 and USAMO 2026
A 31B open-weight model that reaches gold-medal olympiad scores from a documented post-training recipe.
- Interfaze — Hybrid CNN/DNN + LLM Architecture Beats Gemini-3-Flash, Claude-Sonnet-4.6, GPT-5.4-Mini, Grok-4.3 Across 9 OCR, Vision, STT, and Structured-Output Benchmarks
A hybrid model that bolts CNNs and small task-specific networks under an LLM front-end and routes deterministic developer jobs to whichever specialized model is best.
- Cactus Compute Needle — 26M-Parameter Function-Calling Model Distilled From Gemini 3.1, MIT-Licensed, 6000 tok/s on Cactus Runtime
A 26M-parameter open model distilled from Gemini 3.1 that does nothing but call tools — small enough to run on phones, watches, and glasses.
- OpenBMB MiniCPM-V 4.6 — 1.3B Vision-Language Model Runs on iOS, Android, and HarmonyOS, Beating Qwen 3.5-0.8B at 19× Fewer Tokens
Pocket-sized vision-language model — 1.3B params, runs on phones, beats larger models per token spent.
- Thinking Machines TML-Interaction-Small — 276B-A12B Native Interaction Model Listens, Talks, and Watches Concurrently at 0.40s Turn-Taking Latency
A 276B mixture-of-experts that holds a real-time conversation while a separate background model does the slow reasoning.
- Ai2 EMO — 14B MoE That Keeps Near-Full Accuracy When Run on Just 12.5% of Its Experts
An MoE where you can keep just 16 of 128 experts and still have a working model.
- HiDream-O1-Image — 8B Pixel-Level Unified Transformer Open-Sources With #8 Spot on Artificial Analysis Text-to-Image Arena
HiDream open-sources an 8B pixel-level transformer for image generation that beats larger DiTs and lands #8 on the Artificial Analysis arena.
- Tether QVAC MedPsy — On-Device Medical LLM Beats Google MedGemma 27B at Less Than One-Sixteenth the Size
A 4B medical model that fits in 2.6 GB and beats Google's 27B MedGemma on hard clinical reasoning — Apache 2.0, runs entirely offline.
- OpenAI GPT-5.5-Cyber — Security-Tuned Model in Limited Preview for Vetted Defenders
OpenAI ships GPT-5.5-Cyber: a security-tuned preview model gated behind a Trusted Access program for verified defenders.
- xAI Ships Grok Imagine Quality Mode to the API — Photorealistic Image Generation Starting at $0.01/Image
Grok's photorealistic image model — the engine behind 300M+ Grok generations — is now an API for any developer.
- OpenAI GPT-Realtime-2 — Voice Model With GPT-5-Class Reasoning, Plus Live Translate and Whisper Streaming in the API
OpenAI ships three new Realtime API models — a reasoning-grade voice model, live 70→13 language translation, and streaming Whisper transcription.
- Zyphra ZAYA1-8B — AMD-Trained MoE Reasoning Model With <1B Active Parameters
Zyphra trained an 8.4B sparse MoE with under a billion active params on AMD MI300X — and it matches open models 10–100x larger on math and code.
- Phi-4-Reasoning-Vision-15B — Microsoft's Compact Multimodal Reasoner: 88.2% ScreenSpot v2, Open-Weight
Microsoft's 15B open-weight model combines visual reasoning with math and GUI understanding — and knows when NOT to think.
- Genesis AI GENE-26.5 — Foundation Model + Human-Scale Hand for Cooking, Piano, and Rubik's Cubes
An AI 'brain' for robot hands trained on human-scale data, demoed cooking a 20-step meal and solving a Rubik's cube.
- MolmoAct 2 — Ai2's Fully Open Bimanual Robotics Model With 720h Open Dataset
An open VLA model from Ai2 that runs two-armed robots in the real world — weights, code and 720h of bimanual data on day one.
- GPT-5.5 Instant — New ChatGPT Default Cuts Hallucinations 52.5% on Medicine, Law, Finance Prompts
OpenAI's new default ChatGPT model: more factual, more concise, less emoji noise.
- Gemma 4 MTP Drafters — Open-Weight Speculative Decoding Boosts Inference Up To 3x
Tiny companion models pair with Gemma 4 to deliver up to 3x faster inference at identical output quality.
- Grok 4.3 — xAI's New Reasoning Flagship at $1.25/$2.50 per 1M Tokens, 1M-Token Context
xAI's new flagship reasoning model lands with a 1M-token window, image input, and lower per-token pricing than Grok 4.20.
- ERNIE 5.1 Preview — Baidu's New Flagship Hits #1 Chinese, #13 Global on LMArena Text
Baidu's ERNIE 5.1 Preview is the only Chinese model in LMArena's global top 15 and tops legal/government tasks worldwide.
- Tencent Hy-MT1.5 1.25-bit — 440MB On-Device Translator for 33 Languages Beats 72B Models
A 440MB translation model that runs offline on a phone and beats 72B-parameter open models and commercial APIs on Flores-200.
- OpenAI Restricts GPT-5.5 Cyber to 'Trusted Defenders' — Mirroring Anthropic's Mythos Access Model
OpenAI's frontier cyber model goes behind a vetted-defender gate, weeks after Sam Altman publicly mocked Anthropic for doing the same with Mythos.
- IBM Granite 4.1 — 8B Instruct Matches Granite 4.0's 32B MoE on Tool Calling and Instruction Following
IBM's open-weight enterprise model family steps up: smaller dense models that match the previous generation's 32B mixture-of-experts.
- Mistral Medium 3.5 — 128B Open-Weight Flagship Hits 77.6% on SWE-Bench Verified
Mistral's new flagship: 128B dense, 256k context, open weights under modified MIT, and a 77.6 SWE-Bench Verified score.
- Talkie-1930 — 13B Open-Weight LLM Trained Only on Pre-1931 English Text
A 13B LLM with a hard 1930 cutoff: no World War II, no transistors, no internet, trained on OCR'd books and newspapers.
- Gemini Robotics-ER 1.6 — Instrument Reading Jumps from 23% to 93% with Agentic Vision
Gemini's robotics reasoning model now reads physical gauges with 93% accuracy — up from 23% in the previous version.
- Waypoint-1.5 — Interactive 3D World Model at 720p/60fps on Consumer GPUs
A generative world model that renders playable first-person environments at 60fps on a consumer RTX 3090 — no game engine.
- OpenAI Privacy Filter — Apache 2.0 PII Detection Model for On-Prem Data Pipelines
A small, fast Apache 2.0 model from OpenAI for detecting and masking PII in text — designed to run entirely on-premises.
- OmniVoice — Zero-Shot TTS for 600+ Languages at 40× Real-Time
Zero-shot TTS covering 600+ languages at 40× real-time speed — Apache 2.0, from the team behind Kaldi and k2.
- Kimi K2.6 — Moonshot AI's 1T-Parameter Agentic MoE Opens Under Modified MIT
Moonshot AI open-sources a 1-trillion-parameter multimodal model built for orchestrating swarms of up to 300 parallel sub-agents.
- Xiaomi MiMo-V2.5 Series: Multimodal Agent Model with 42% Token Efficiency Edge
Xiaomi's MiMo-V2.5-Pro matches GPT-5.4 on SWE-bench Pro while using 42% fewer tokens, with native vision, audio, and video in a single 1T MoE model.
- Claude Mythos Preview + Project Glasswing: Anthropic's Restricted AI for Zero-Day Discovery
Anthropic's Mythos Preview autonomously discovers and exploits zero-days across every major OS and browser — the first frontier model Anthropic deemed too dangerous for public release.
- GPT-5.5 — OpenAI's Fully Retrained Flagship: 82.7% Terminal-Bench, 1M Context
GPT-5.5 is OpenAI's first fully retrained base model since GPT-4.5: 82.7% on Terminal-Bench 2.0, 13 points ahead of Claude Opus 4.7.
- DeepSeek V4 — 1.6T Open-Weights MoE Tops LiveCodeBench with MIT License
DeepSeek's new open-weights flagship: two MIT-licensed MoE models with 1M-token context and top-tier coding performance released today.
- Tencent Hy3-Preview — 295B Open-Source MoE with 74.4% SWE-bench
Tencent's first open-source flagship under ex-OpenAI researcher Yao Shunyu — 74.4% SWE-bench in a 295B MoE.
- Qwen3.6-27B — Flagship-Level Coding in a 27B Dense Open-Weights Model
A 27B dense model scoring 77.2% SWE-bench Verified — Apache 2.0, multimodal, runnable on consumer hardware.
- Claude Opus 4.7 — 87.6% SWE-bench, 3.75MP Vision, Task Budgets, at Same $5/$25 Pricing
Anthropic's most capable publicly available model: double-digit coding gains, 3× vision resolution, and a new effort system — same price as the model it replaces.
- NVIDIA Ising — Open AI Models for Quantum Processor Calibration and Error Correction
NVIDIA's Ising family applies AI to two core quantum computing bottlenecks: calibrating noisy qubits and decoding quantum errors in real time.
- GPT Image 2 — OpenAI's Reasoning-Augmented Image Model with 2K Resolution
OpenAI's new image model thinks before it generates — reasoning and optionally searching the web before producing images.
- DenseOn & LateOn — New BEIR Records at 149M Parameters, Fully Open
Two 149M-parameter retrieval models from LightOn AI set new BEIR records — the smallest ColBERT and dense models to reach these scores.
- Ternary Bonsai — 1.58-Bit 8B Runs at 82 tok/s on M4 Pro in 1.75 GB
1.58-bit ternary weights fit an 8B model in 1.75 GB and run fast enough for interactive use on iPhones and MacBooks.
- Qwen3.6-Max-Preview — Alibaba's 1T-Parameter Coding Flagship, #5 on SWE-bench Pro
Alibaba's new 1T-parameter MoE flagship scores #5 on SWE-bench Pro Public and leads six coding benchmarks over Qwen3.6-Plus.
- GPT-Rosalind — OpenAI's first domain-specific frontier reasoning model, for life sciences
OpenAI's first purpose-built domain model — frontier reasoning tuned on molecules, proteins, genes, and experimental workflows.
- NVIDIA Nemotron OCR v2 — Unified Multilingual OCR, 28x Faster Than PaddleOCR
A single unified OCR model from NVIDIA that reads six languages at 34.7 pages per second — no per-language models needed.
- HY-World 2.0 — Tencent Open-Sources 3D World Model: Image or Text to Navigable 3D Scene
Tencent's open 3D world model turns a single image or text prompt into a fully navigable, editable 3D scene — real geometry, not video.
- Qwen3.6-35B-A3B — 35B MoE Coding Model, 3B Active Params, SWE-bench 73.4%
Alibaba's 35B MoE model uses only 3B active params, scores 73.4% SWE-bench Verified, and runs locally on a MacBook Pro.
- Gemini 3.1 Flash TTS Preview — Google's Inline Audio-Tag Text-to-Speech
Google's new TTS API model lets you control speech expression by embedding tags like [whispers] directly in text.
- ERNIE-Image — Baidu's Open-Weight Text-to-Image Diffusion Transformer
Baidu's open-weight 8B diffusion transformer for text-to-image, with a Turbo variant that generates in 8 steps.
- Kronos — Foundation Model for Financial K-Line Sequences
The first foundation model that treats financial candlestick data as a specialized language.
- Muse Spark
Meta's first model after the Alexandr Wang reshuffle — multimodal reasoning that matches Llama 4 Maverick with an order of magnitude less compute.
- GLM-5.1
A 754B open-weight Mixture-of-Experts model from China that beats GPT-5.4 and Claude Opus 4.6 on SWE-Bench Pro — under an MIT license.
- Kimi Code K2.6 — Moonshot's Upgraded Coding Agent Model
Moonshot's Kimi Code gets a major model upgrade with deeper reasoning and cleaner agent planning — subscribers say it feels like Opus.
- Audio Flamingo Next — Open Audio-Language Model with 30-Minute Temporal Reasoning
NVIDIA and UMD open-source an audio-language model that timestamps its reasoning through 30-minute recordings and beats Gemini 2.5 Pro on MMAU-Pro.
- WildDet3D — Promptable monocular 3D detection in the wild
Single-image 3D detection that takes text, box or point prompts and works zero-shot across indoor, street and nature scenes.
- MiniMax M2.7 — open-weights self-evolving agent model
MiniMax's frontier agentic model is now downloadable — a 229B MoE that scores like GPT-5.3-Codex on SWE-Bench Pro.
- EXAONE 4.5 — LG's first open-weight vision-language model
LG's EXAONE family finally goes multimodal — a 33B VLM with 262K context, document-understanding focus, and full inference-stack support on day one.
- LPM 1.0 — Large Performance Model for video-based character performance
A 17B video model that makes a still character image act out a full conversation in real time — listening, speaking, reacting, in character.
- PersonaPlex — NVIDIA's real-time speech-to-speech model with persona control
A real-time speech-in/speech-out model from NVIDIA you can prompt into a specific persona and voice, with CPU offload for low-VRAM machines.
- LFM2.5-VL-450M — Liquid AI's Tiny Vision-Language Model for Edge
A 450M-parameter vision-language model that runs on a phone, understands 8 languages, and can locate objects in images.
- Trinity-Large-Thinking — Arcee's 400B Open Reasoning Agent Model
A 400B sparse model that activates only 13B parameters per token and costs 96% less than Opus for agent tasks.
- HappyHorse-1.0 — Alibaba's Video Generation Model Tops Arena Rankings
Alibaba's stealth video model topped blind rankings under a pseudonym, then revealed itself as the new arena champion.
- Harrier — Microsoft's Open-Source Multilingual Embedding Model Family
Microsoft's Bing team open-sources a decoder-only embedding family that tops the multilingual MTEB-v2 leaderboard.
- Gemma 4 — open models for reasoning and agents
Four open models from Google under Apache 2.0 that bring frontier-tier reasoning and native tool-use to consumer GPUs and phones.
- Qwen 3.6-Plus — agentic coding flagship
Alibaba's new flagship language model with a million-token context window and agentic coding that rivals Opus 4.6 at nearly double the speed.
- Qwen 3.5-Omni — native omnimodal model
A single model that natively sees, hears, reads, and speaks — with 113-language speech recognition and real-time voice cloning from a 10-second sample.
- MiMo-V2-Pro — Xiaomi's trillion-parameter agentic model
A trillion-parameter model from Xiaomi that appeared anonymously on OpenRouter, got mistaken for DeepSeek V4, and turned out to be a genuine frontier contender at $1/M input tokens.
- GPT-5.4 mini and nano
OpenAI's smallest GPT-5.4 variants — mini is free-tier fast, nano is API-only cheap, both punch above their weight.
- GPT-5.4 — OpenAI's frontier model with native computer use
OpenAI's frontier model with native desktop automation that beats human experts on OSWorld — the first model to do so — plus a million-token context window.
- LTX-2.3 — open-source 4K video+audio generation
The first open-source model that generates native 4K video with synchronized audio in one forward pass — under Apache 2.0.
- Gemini 3.1 Flash-Lite — cost-efficient multimodal model
Google's cheapest and fastest Gemini 3 model — 2.5x faster first-token delivery than Flash, $0.25 per million input tokens, and a million-token context window.
- Grok 4.20 Beta 2 — multi-agent reasoning with rapid learning
xAI's multi-agent reasoning model that routes queries to four parallel thinkers, updates itself weekly, and holds a 2-million-token context window.
- Qwen 3.5 Small — on-device multimodal models
Four natively multimodal models from 0.8B to 9B parameters — small enough for phones, capable enough to beat last-gen 30B models.
- HY-Embodied-0.5
Open-weights embodied foundation models from Tencent — built to give robot VLA stacks a real spatial-temporal brain.