AI/TLDR — every new AI model, tool, repo & paper
The latest AI releases, refreshed every 2 hours and explained in plain English.
What AI shipped today?
In the last 24 hours AI/TLDR tracked 15 new AI releases, including Moebius — 0.22B image inpainting matches FLUX.1-Fill-Dev's 11.9B, agent-eval — Hugging Face harness benchmarks coding agents on your own library and Beyond LoRA — Hugging Face benchmark shows OFT and BEFT can beat the default. AI/TLDR is an AI release tracker that follows new AI models, open-source tools, papers, datasets and benchmarks — refreshed every 2 hours from verified primary sources and explained in plain English.
AI Release Index — live stats on AI releases · Learn AI
- Moebius — 0.22B image inpainting matches FLUX.1-Fill-Dev's 11.9B
Moebius is a 0.22B image inpainting model from HUST and VIVO AI Lab that matches the 11.9B FLUX.1-Fill-Dev across six benchmarks while running over 15x faster, with code and weights now on GitHub.
- agent-eval — Hugging Face harness benchmarks coding agents on your own library
Hugging Face shipped agent-eval, an open harness that measures how well coding agents like Kimi-K2.6 and GLM-5.1 use a library — not just task completion, but token cost, time, and error rate across bare, clone, and skill access tiers.
- Beyond LoRA — Hugging Face benchmark shows OFT and BEFT can beat the default
Hugging Face benchmarked five PEFT methods on identical tasks and found OFT beats LoRA on image fine-tuning while BEFT and Lily trade memory for accuracy on math. PEFT also gained an adapter-to-LoRA converter for vLLM compatibility.
- Cursor 3.8 — /automate skill plus new GitHub and Slack automation triggers
Cursor 3.8 launches Cursor Automations: describe a task in plain English with /automate and it runs on its own when a GitHub event or Slack emoji react fires. Adds five GitHub triggers; cloud agents get computer use by default.
- Noam Shazeer to OpenAI — Gemini co-lead becomes Lead for Architecture Research
Noam Shazeer, Google VP and Gemini co-lead, is leaving Google to join OpenAI as Lead for Architecture Research, two years after Google paid $2.7B to bring him back from Character.AI.
- In the Weights — ex-OpenAI tool scores whether AI models remember your name
Joey Flynn and Thomas Dimson, both ex-OpenAI, launched a free site that types a name into multiple LLMs in parallel and returns a 0–996 'strength score' for how well models recall the person from training data alone.
- DeepMind AI Control Roadmap — defense-in-depth for misaligned AI agents
Google DeepMind publishes the AI Control Roadmap: a defense-in-depth framework that treats internal AI agents as insider threats, with trusted supervisor models monitoring their actions in real time.
- Two Minute Papers: 'Scientists Found A Better Language For AI Agents'
Two Minute Papers covers RecursiveMAS, a UIUC/Stanford/NVIDIA/MIT framework that lets multi-agent systems trade latent thoughts instead of text and reports 2.4x faster inference with 75.6% fewer tokens.
- MolmoMotion — Ai2's language-guided 3D motion forecasting models
Ai2 released MolmoMotion, two open models that predict where points on objects will move in 3D space from a video frame plus a text instruction. The drop bundles a 1.16M-video training set and the PointMotionBench eval, and lifts a robot pick-and-place baseline from 56.0% to 76.3%.
- MosaicLeaks — ServiceNow benchmark for research-agent privacy leaks
ServiceNow released MosaicLeaks, a 1,001-chain benchmark that measures how much private context a research agent leaks into its web queries, plus PA-DR, an RL recipe that drops leakage from 51.7% to 9.9% on Qwen3-4B with no loss of task success.
- Agentic Resource Discovery — HF, Microsoft, Google open spec for tool lookup
Hugging Face, Microsoft, Google, and GoDaddy launched Agentic Resource Discovery, an open spec that lets agents find MCP servers, skills, and A2A endpoints at runtime. ARD defines an ai-catalog.json manifest and a POST /search registry API, with a reference Discover Tool on HF.
- Sam Witteveen: VibeThinker 3B — taking on giant models
Sam Witteveen reviews Weibo's VibeThinker-3B, the new 3B-parameter reasoning model that scores 80.2% on LiveCodeBench v6. Witteveen walks through how a 3B open-weights model competes with much larger frontier models on code and math reasoning.
- MCP Enterprise-Managed Authorization — zero-touch OAuth for Claude, VS Code, Linear
Enterprise-Managed Authorization, an MCP extension, is now stable. Admins provision MCP servers once through Okta and users get every connector on first login with no per-app OAuth. Claude, VS Code, Linear, Figma, Asana, Atlassian and Supabase ship support; Ramp is live with 2,000 users.
- OpenAI Codex Record & Replay — demonstrate a macOS workflow, get a reusable skill
OpenAI Codex App 26.616 adds Record & Replay on macOS. Run a workflow once and Codex saves it as an editable, reusable skill that replays through Computer Use, browser tools, and plugins. Requires Computer Use; not in the EEA, UK, or Switzerland at launch.
- Simon Willison: GLM-5.2 is probably the most powerful text-only open weights LLM
Simon Willison calls Z.ai's GLM-5.2 today's strongest open-weights text LLM: top of Artificial Analysis Intelligence Index v4.1 at 51, second on Code Arena WebDev behind Claude Fable 5, and ~$1.40/$4.40 per 1M tokens on OpenRouter vs GPT-5.5's $5/$30.
- Wes Roth: Google's 'POST AGI' paper — DeepMind's AGI-to-ASI roadmap
Wes Roth breaks down 'From AGI to ASI', a Google DeepMind paper co-signed by Shane Legg, Marcus Hutter and Thore Graepel that maps four routes from human-level AI to superintelligence.
- LifeSciBench — OpenAI's 750-task benchmark for life-science research
OpenAI's LifeSciBench grades AI models on 750 expert-authored life-science tasks using rubrics. The strongest model, GPT-Rosalind, passes only 36.1%, with attached data files cited as the main bottleneck.
- ChatGPT Scheduled Tasks Hub — sidebar page replaces Pulse
ChatGPT now has a dedicated Scheduled page in the sidebar where users can view, pause, edit, or delete recurring prompts and monitoring tasks. The launch replaces Pulse, which OpenAI retires in 14 days.
- Wolfram Language 15 — Mathematica adds built-in AI Assistant and Agent Tools
Wolfram Language 15 ships a built-in AI Assistant inside every notebook and a new Wolfram Agent Tools framework that lets Claude Code and other agent environments call Mathematica functions and read or write notebooks.
- Superpowers v6.0 — Jesse Vincent's agentic-skills framework, ~2x faster reviews
Superpowers v6.0 reworks Jesse Vincent's open-source coding-agent skills framework so Claude Code and Codex finish reviews in about half the time and use roughly 50% fewer tokens, with one reviewer per task and three new harness integrations.
- Alex Ellis: 'Local Qwen Isn't a Worse Opus — It's a Different Tool'
Alex Ellis argues local Qwen models are not stripped-down Opus stand-ins but a different tool, useful for bounded private work like telemetry analysis and code review, even after he spent $12,000 on an RTX 6000 Pro and saw the model still loop and hallucinate on open-ended tasks.
- Wes Roth: 'Here's REALLY WHY Fable 5 Got Banned'
Wes Roth follows up his initial Claude Fable 5 ban reaction with a deeper-dive explainer video on the US national-security directive that pulled Anthropic's flagship model and the Mythos 5 preview off the market.
- Midjourney Medical — full-body ultrasonic CT scanner, 60-second scan, SF spa in 2027
Midjourney Medical is a new Midjourney division building an Ultrasonic CT full-body scanner. The 8,960-transducer device uses Butterfly Network's ultrasound-on-chip silicon to capture a whole-body image in about 60 seconds with no radiation. The first Midjourney Spa opens in San Francisco in 2027.
- Fireship: 'I read every major CS paper of the last 100 years'
Fireship races through ten foundational computer-science papers, half of them on the path that led to today's LLMs: Rosenblatt's perceptron (1958), Minsky and Papert's perceptron critique (1969), the backprop paper (1986), AlexNet (2012), 'Attention Is All You Need' (2017), and Brown's GPT-3 paper (2020).
- CADAM v0.3.0 — open-source text-to-CAD web app from YC startup Adam
CADAM is an open-source web app that turns text or image prompts into parametric 3D CAD models. Built by YC W25 startup Adam, it runs in the browser, uses Claude to write OpenSCAD code, and exports STL, SCAD, and DXF under GPL-3.0.
- US Pauses DeepSeek Blacklist — 100+ Chinese AI firms spared Entity List
The Trump administration is holding off adding DeepSeek, memory chipmaker CXMT, and over 100 other Chinese firms to the Commerce Department's Entity List, despite an interagency committee approving the listings last year. No new entries have been added since October 2025.
- Qwen-Robot Suite — Alibaba's three foundation models for robots
Alibaba's Qwen team ships Qwen-Robot Suite, three open foundation models for embodied AI: Qwen-RobotManip for manipulation, Qwen-RobotNav for navigation, and Qwen-RobotWorld as a video world model.
- VibeThinker-3B — Weibo's 3B reasoning model hits 80.2% on LiveCodeBench v6
VibeThinker-3B is a 3-billion-parameter dense reasoning model from Sina Weibo's AI lab that posts 94.3 on AIME26 and 80.2 Pass@1 on LiveCodeBench v6, with MIT-licensed weights on HuggingFace and code on GitHub.
- Grok Imagine Video 1.5 — xAI's image-to-video model goes GA at $0.14/sec 720p
Grok Imagine Video 1.5 is generally available on the xAI Imagine API, grok.com/imagine, and the Grok iOS and Android apps. xAI prices 720p output at $0.14 per second and says a 6-second 720p clip renders in about 25 seconds, down from 40+ in the prior model.
- OpenAI Deployment Simulation — predict misbehavior before release
OpenAI Deployment Simulation replays real past user conversations through a candidate model before launch to forecast misbehavior rates. Tested across 1.3M conversations from GPT-5 Thinking to GPT-5.4 with a median 1.5x multiplicative error.
- cuTile Rust v0.2.0 — NVIDIA Labs ships NVFP4 GPU kernels in safe Rust
NVIDIA Labs ships cuTile Rust v0.2.0, a safe tile-based GPU kernel DSL for Rust with NVFP4 packing and block-scaled GEMM on B200. A companion paper, Fearless Concurrency on the GPU, reports 7 TB/s element-wise and 2 PFlop/s GEMM throughput.
- Sam Witteveen: GLM 5.2 — the top new open-weights model
Sam Witteveen walks through Z.ai's GLM 5.2, the new top open-weights model on the Artificial Analysis Intelligence Index. Witteveen covers the MIT license, 1M-token context, coding benchmarks, and how GLM 5.2 stacks against closed frontier models.
- 1littlecoder: 'GLM 5.2 Is the New AI Code King'
1littlecoder runs Z.ai's new GLM 5.2 through agentic coding tasks and argues it edges out the current open-weight leaders on real work, not just benchmarks.
- SpaceX to Buy Cursor for $60B — stock deal days after blockbuster IPO
SpaceX agreed to acquire Anysphere, maker of the Cursor AI coding editor, in a $60B all-stock deal. The acquisition is meant to feed Cursor into SpaceX's AI work tied to xAI, and is expected to close in Q3 2026.
- Meta Launches 'AI Mode' on Facebook — answers built from public posts
Meta added an 'AI Mode' to Facebook search that answers natural-language questions with summaries pulled from public posts, Groups, and Reels across its platforms.
- Android 17 Ships — Gemini Omni video editing, Lyria 3 music, AudioLM
Google released Android 17 to Pixel devices alongside a Pixel Drop that expands Gemini: Gemini Omni edits videos inside conversations, Lyria 3 generates music from text or images, and AudioLM adds speech-to-translation on Pixel 10a.
- Vicki Boykis: 'Running Local Models Is Good Now'
ML engineer Vicki Boykis argues local models finally cleared the practical bar — she runs agentic coding workflows on Gemma 4 and friends, hitting about 75% of frontier cloud accuracy without API dependencies.
- Two Minute Papers: 'They Looked Inside Claude's AI's Mind. It Got Weird'
Károly Zsolnai-Fehér's new Two Minute Papers video walks through the latest mechanistic interpretability work on Claude — what Anthropic's researchers found by probing the model's internal features.
- DreamX-World 1.0 — Alibaba AMAP open-sources an interactive world model
Alibaba's AMAP research team releases a 5B Apache-2.0 video world model with camera navigation, scene revisit, and event control across photoreal, game-style, and stylized domains. Code, paper, and two checkpoints are out.
- FastContext — Microsoft's Explore subagent cuts coding-agent tokens by 60%
Microsoft and SJTU release FastContext: a small repository-exploration subagent that does parallel code search and hands focused context to a larger coding model. MIT-licensed code and 4B SFT and RL checkpoints are out.