AI/TLDR — every new AI model, tool, repo & paper
The latest AI releases, refreshed every 2 hours and explained in plain English.
What AI shipped today?
In the last 24 hours AI/TLDR tracked 11 new AI releases, including GitHub Copilot CLI GA — tabbed terminal with MCP, skills, and plugins, GitHub Copilot app gets BYOK — Anthropic, Ollama, and LM Studio supported and Mistral Connectors — admin controls, multi-account, and a new debugger. AI/TLDR is an AI release tracker that follows new AI models, open-source tools, papers, datasets and benchmarks — refreshed every 2 hours from verified primary sources and explained in plain English.
AI Release Index — live stats on AI releases · Learn AI
- GitHub Copilot CLI GA — tabbed terminal with MCP, skills, and plugins
GitHub Copilot CLI's new terminal UI is generally available, adding Session/Issues/PRs/Gists tabs, in-session /mcp, /skills, /plugin, and /settings commands, and theme-aware accessibility with screen-reader support.
- GitHub Copilot app gets BYOK — Anthropic, Ollama, and LM Studio supported
The GitHub Copilot desktop app now supports bring-your-own-key for Azure OpenAI, Anthropic, Microsoft Foundry, Foundry Local, LM Studio, Ollama, and any OpenAI-compatible endpoint, with keys stored in the OS keychain.
- Mistral Connectors — admin controls, multi-account, and a new debugger
Mistral expands its Connectors stack with per-workspace admin controls (GA), API keys scoped to specific connectors (GA), multi-account auth (GA), an 11-step Connectors Debugger (preview), and connector support inside Vibe Code and Workflows.
- OpenAI Jalapeño — first custom inference chip, built with Broadcom
OpenAI Jalapeño is OpenAI's first custom inference chip, co-designed with Broadcom and built by Celestica. The ASIC targets LLM inference at substantially better performance per watt; engineering samples already run GPT-5.3-Codex-Spark.
- Gemini 3.5 Flash gets Computer Use — native browser, mobile, and desktop agents
Google DeepMind built Computer Use directly into Gemini 3.5 Flash, so the main Flash model can now drive a browser, an Android phone, or a desktop on its own through the Gemini API and Enterprise Agent Platform.
- Krea 2 — open-weight 12B image model with 2-second Turbo variant
Krea 2 is a 12B Diffusion Transformer text-to-image model released as open weights in two variants: Raw for fine-tuning and Turbo, which generates 2K images in about 2 seconds.
- OpenMontage — AGPL agentic video studio for AI coding assistants
OpenMontage is an open-source agentic video production system that turns Claude Code, Codex, or any AI coding assistant into a full video studio with 12 pipelines, 52 tools, and 500+ skills.
- Qwen-AgentWorld — language world models that simulate seven agent domains
Qwen-AgentWorld is a pair of open-weight world models (35B-A3B and 397B-A17B) that simulate seven agent environments — MCP, search, terminal, software engineering, Android, web, and OS — through chain-of-thought reasoning.
- OpenAI Daybreak — GPT-5.5-Cyber and Patch the Planet go live
OpenAI expands its Daybreak security program with the full release of GPT-5.5-Cyber, an updated Codex Security plugin, a partner program with CrowdStrike, Sophos, and Fortinet, and Patch the Planet, an open-source fix-funding effort with Trail of Bits.
- Nathan Lambert: GLM-5.2 — the step change for open agents
Nathan Lambert argues GLM-5.2 is the first open-weight model that feels right in coding harnesses as a general agent, matching closed leaders like Claude Opus 4.8 about seven months after they shipped, in what he calls a DeepSeek R1-style threshold moment.
- David Rosenthal: 'AI's Affordability Crisis' — the 70x subsidy that can't hold
David Rosenthal pulls together SemiAnalysis and Ed Zitron numbers to argue AI tokens are sold at a fraction of cost — Anthropic up to 40x, OpenAI up to 70x — and that real billing would turn a $200 ChatGPT plan into a $14,000 bill.
- Latent Space: 'Red-Teaming after Mythos' — Gray Swan on AI security
Latent Space hosts Zico Kolter (OpenAI board, CMU) and Matt Fredrikson (Gray Swan CEO) to argue AI security is not 'cybersecurity with AI' — Gray Swan's Shade red-teaming model now beats human attackers at breaking frontier LLMs.
- Claude Tag — Anthropic's @Claude Slack agent for shared teamwork
Claude Tag is a Slack agent from Anthropic that anyone in a channel summons by tagging @Claude. The shared bot breaks a request into stages, runs the work with tools an admin scoped per channel, and posts results back to the thread.
- Fireship: 'Midjourney wants to delete 30% of all death…'
Fireship reacts to Midjourney Medical's pitch — a 60-second full-body ultrasonic CT scan planned for a San Francisco spa — and the company's bold claim that AI-driven early diagnostics could prevent a large share of premature deaths.
- Simon Willison: 'Prompt Injection as Role Confusion'
Simon Willison highlights a new paper by Charles Ye, Jasmine Cui, and Dylan Hadfield-Menell arguing prompt injection is really 'role confusion' — language models lean on style cues, not content, to tell trusted text from user input.
- Armin Ronacher: 'The Coming Loop' — why even skeptics end up looping
Armin Ronacher argues 'harness loops' — outer systems that re-run AI agents past their natural stopping point — work well for code porting and benchmark runs, but breed defensive, dependency-creating code when pointed at real codebases.
- Mistral OCR 4 — 170-language document model with bounding boxes and confidence scores
Mistral OCR 4 extracts text plus per-block bounding boxes, type labels, and confidence scores across 170 languages, scoring 85.20 on OlmOCRBench and 93.07 on OmniDocBench at $4 per 1,000 pages.
- Wes Roth: 'Cursor JUST beat EVERYONE…'
Wes Roth's new video argues Cursor has pulled ahead of rival AI coding agents, walking through the Cursor Compile 26 opening keynote and the Composer 2.5 in-house coding model.
- Baidu Unlimited-OCR — 3B vision model parses long documents in one pass
Baidu's Unlimited-OCR is a 3B vision-language model that introduces Reference Sliding Window Attention to keep a constant KV cache, letting one forward pass transcribe dozens of document pages within a 32K context. Code and weights ship under MIT.
- PP-OCRv6 — PaddlePaddle ships 50-language OCR family from 1.5M to 34.5M params
PP-OCRv6 is the next PaddleOCR family with Tiny (1.5M), Small (7.7M), and Medium (34.5M) tiers covering 50 languages. The Medium tier lifts detection Hmean to 86.2% and recognition accuracy to 83.2%, gains of 4.6 and 5.1 points over PP-OCRv5_server.
- Oak — version control built for AI coding agents
Oak is a new version control system for AI coding agents that mounts repos lazily, runs branch-per-task, and benchmarks up to 95% faster than Git on snapshots, large binaries, and dirty trees.
- Anthropic-Cybersecurity-Skills v1.3.0 — 817 security skills across 6 frameworks
Mahipal Jangra's open Anthropic-Cybersecurity-Skills library jumps from 762 to 817 agent skills in v1.3.0, adding AI Security, Supply Chain, and Hardware/Firmware domains plus MITRE F3 as a sixth framework mapping.
- Simon Willison — porting Moebius image inpainting to the browser via Claude Code
Simon Willison shows how he used Claude Opus 4.8 to port the 0.22B Moebius image-inpainting model from PyTorch/CUDA to a browser-only WebGPU + ONNX demo, with the agent doing the framework conversion, weight upload, and UI work.
- Claude Code 2.1.186 — MCP login CLI plus auto-reply to bash commands
Claude Code v2.1.186 adds claude mcp login/logout for CLI-based MCP server auth, makes ! bash commands auto-prompt Claude to respond, and fixes 20+ background-agent and post-sleep streaming bugs.
- OpenAI Codex — SSD-burning SQLite log bug patched after 640 TB/year reports
OpenAI Codex CLI shipped two patches that cut about 85% of its SQLite log writes. Users had measured 37 TB written in 21 days, on track for 640 TB a year and full-drive SSD wear in months.
- Two Minute Papers: 'DeepSeek Just Solved AI's Billion Dollar Problem'
Two Minute Papers walks through the DualPath paper, which attacks the KV-cache I/O bottleneck behind agentic LLM serving costs and reports up to 1.96x higher online throughput.
- Sakana Fugu — multi-agent orchestration model that matches Fable 5 on quality
Sakana AI launched Fugu and Fugu Ultra, a multi-agent orchestration model delivered as one OpenAI-compatible API. Fugu Ultra coordinates a pool of expert agents and is reported to match Fable 5 on coding, reasoning, science, and agentic benchmarks.
- Hermes Agent v0.17.0 — iMessage, WhatsApp, and async subagents from Nous
Hermes Agent v0.17.0 'The Reach Release' adds iMessage support via Photon Spectrum (no Mac relay), an official WhatsApp Business Cloud adapter, Raft agent-network integration, and background subagents that return handles. 1,475 commits, 245 contributors.
- Palmier Pro v0.3.5 — open-source macOS video editor for AI agents
Palmier Pro v0.3.5 adds transcript-based cutting, ripple-insert trims with linked audio, folder imports, and a Claude Opus 4.8 upgrade to the Swift-native macOS video editor that exposes every timeline action to AI agents over MCP.
- Cloudflare Temporary Accounts — AI agents deploy live Workers in seconds, no signup
Cloudflare's new wrangler deploy --temporary command lets an AI agent provision a live Workers account in seconds without signup. The account stays usable for 60 minutes, after which a human can claim it permanently or let it auto-expire.
- Grok on Databricks — xAI models land in Agent Bricks via SpaceX deal
Grok on Databricks makes Grok 4.3 and Grok Build 0.1 natively callable from Databricks Agent Bricks, so enterprise teams can wire xAI models into governed Lakehouse data without external pipelines.
- John Jumper to Anthropic — Nobel laureate AlphaFold creator leaves DeepMind
John Jumper, the 2024 Nobel chemistry laureate and AlphaFold co-creator, posted on X that he is leaving Google DeepMind after nearly nine years to join Anthropic. He is the second senior Google AI departure this week after Noam Shazeer.
- Nathan Lambert: 'Banning Open Source AI Would Be A Mistake'
Nathan Lambert and Kevin Xu argue banning open-source AI would hurt US security, education, and competition. Their Interconnects post responds to an executive order to review AI models and a block on foreign access to Anthropic models.
- Moebius — 0.22B image inpainting matches FLUX.1-Fill-Dev's 11.9B
Moebius is a 0.22B image inpainting model from HUST and VIVO AI Lab that matches the 11.9B FLUX.1-Fill-Dev across six benchmarks while running over 15x faster, with code and weights now on GitHub.
- agent-eval — Hugging Face harness benchmarks coding agents on your own library
Hugging Face shipped agent-eval, an open harness that measures how well coding agents like Kimi-K2.6 and GLM-5.1 use a library — not just task completion, but token cost, time, and error rate across bare, clone, and skill access tiers.
- Beyond LoRA — Hugging Face benchmark shows OFT and BEFT can beat the default
Hugging Face benchmarked five PEFT methods on identical tasks and found OFT beats LoRA on image fine-tuning while BEFT and Lily trade memory for accuracy on math. PEFT also gained an adapter-to-LoRA converter for vLLM compatibility.
- Cursor 3.8 — /automate skill plus new GitHub and Slack automation triggers
Cursor 3.8 launches Cursor Automations: describe a task in plain English with /automate and it runs on its own when a GitHub event or Slack emoji react fires. Adds five GitHub triggers; cloud agents get computer use by default.
- Noam Shazeer to OpenAI — Gemini co-lead becomes Lead for Architecture Research
Noam Shazeer, Google VP and Gemini co-lead, is leaving Google to join OpenAI as Lead for Architecture Research, two years after Google paid $2.7B to bring him back from Character.AI.
- In the Weights — ex-OpenAI tool scores whether AI models remember your name
Joey Flynn and Thomas Dimson, both ex-OpenAI, launched a free site that types a name into multiple LLMs in parallel and returns a 0–996 'strength score' for how well models recall the person from training data alone.
- DeepMind AI Control Roadmap — defense-in-depth for misaligned AI agents
Google DeepMind publishes the AI Control Roadmap: a defense-in-depth framework that treats internal AI agents as insider threats, with trusted supervisor models monitoring their actions in real time.