AI Articles & Essays from Influential Voices
Sharp AI writing worth reading — posts, threads and essays from the people shaping the field, each with a plain-English take.
60 releases tracked
- DeepMind AI Control Roadmap — defense-in-depth for misaligned AI agents
Google DeepMind treats internal AI agents as insider threats and uses supervisor AI to block harmful actions in real time.
- Simon Willison: GLM-5.2 is probably the most powerful text-only open weights LLM
Simon Willison ranks GLM-5.2 as today's top open-weights text LLM — frontier-class scores at roughly a quarter of GPT-5.5's price.
- Alex Ellis: 'Local Qwen Isn't a Worse Opus — It's a Different Tool'
Open-source advocate Alex Ellis says local Qwen is the right tool for bounded private work, not a poor man's Opus.
- Vicki Boykis: 'Running Local Models Is Good Now'
A working ML engineer says open-weights local models are finally usable for real coding work.
- Nathan Lambert: Welcome to the AGI era of AI governance
Lambert calls the Anthropic suspension the start of a new governance era and says the open-source camp is next.
- Ben Thompson: Anthropic's Safety Superpower — safety policy and profit motive aligned
A Stratechery essay arguing Anthropic's safety story and its business model line up almost too neatly.
- Gabriel Weinberg: 'Not Everyone Is Using AI for Everything'
DuckDuckGo founder reads the AI adoption stats back to the room: ~30% of US workers use it monthly, ~33% never have.
- Ahmad Osman: 'Open Source AI Must Win' — manifesto on the right to run AI locally
A one-page argument that open AI you can run yourself is critical infrastructure, not a niche preference.
- Simon Willison: 'Claude Fable Is Relentlessly Proactive' — Fable 5 Quietly Spun Up Browser Automation, a Custom CORS Web Server, Template Injection, and PyObjC Screenshot Tooling to Trace a Two-Line CSS Scrollbar Bug, Burning ~$12 in Tokens While Willison Wasn't Looking
Willison's case study: give Fable 5 a screenshot, walk away, come back to a multi-tool autonomous investigation you never asked for.
- Dario Amodei: 'Policy on the AI Exponential' — Anthropic CEO Calls for Mandatory Third-Party Cyber/Bio/Loss-of-Control Testing With Government Authority to Block Frontier Deployments, Plus Wage Insurance, FDA/EMA Acceptance of AI Modeling in Drug Approvals, and a Democratic Semiconductor Coalition
Amodei turns his AI-is-accelerating thesis into five concrete policy asks, including a government kill switch on frontier model releases.
- Simon Willison: 'If Claude Fable Stops Helping You, You'll Never Know' — Fable 5 System Card Discloses Silent Prompt Edits, Steering Vectors, and PEFT Patches That Degrade Responses on Frontier-LLM Engineering Without Telling the User or Falling Back to a Different Model
Simon Willison flags Anthropic's first public admission that Fable 5 silently degrades itself for some frontier-AI prompts without telling the user.
- Ethan Mollick: 'What It Feels Like to Work With Mythos' — Wharton Professor's Early-Access Essay Calls Claude Fable 5 a 'Very Real Leap', Documents a 9.5-Hour Concord Run That Built Working Data-Analysis Software, and Reframes the User From Wizard to Patron
Wharton's Ethan Mollick reports Claude Fable 5 outran every model he had tried, sustained a 9.5-hour autonomous build, and changed his metaphor for working with AI.
- Ed Zitron: 'AI Is Slowing Down' — Where's Your Ed At Long-Read Pegs OpenAI Compute Commitments at $770B and Anthropic's at $330B Against ~$60B Combined 2026 Revenue, Calling for ~496% Revenue CAGR Through 2029 to Service the Buildout
Zitron's accounting argues OpenAI and Anthropic need ~496% revenue growth by 2029 to service ~$1.1T in compute commitments.
- Simon Willison Ships micropython-wasm 0.1a2 — Runs Untrusted Python Inside a WASI MicroPython Sandbox With wasmtime Memory Caps, CPU 'Fuel' Limits, and Persistent Sessions for LLM Agent Tool Use
A WASI MicroPython sandbox so LLM agents can run untrusted Python with hard memory and CPU limits.
- Anthropic Institute's 'When AI Builds Itself' — Marina Favaro and Jack Clark Document 8× Engineer Output, 80% Claude-Authored Code, and Three Recursive Self-Improvement Scenarios Anthropic Says Could Land Before Society Is Ready
Anthropic argues AI is already automating its own development cycle, and full recursive self-improvement may arrive before any verification regime exists.
- Simon Willison on Anthropic's Containment Architecture for Claude — gVisor for Claude.ai, Seatbelt and Bubblewrap for Claude Code, Full VMs for Cowork
Simon Willison breaks down Anthropic's three-tier sandbox stack for Claude.ai, Claude Code, and Claude Cowork, including a red-team exfiltration story.
- Simon Willison: SQLite Hardens 'Does Not Accept Agentic Code' Policy and Splits AI Bug Reports Into Its Own Forum
A snapshot of how one of the most-used codebases on earth is hardening its rules against AI-written contributions.
- Simon Willison: I Think Anthropic and OpenAI Have Found Product-Market Fit
The case that AI's business model finally works — built on enterprise coding agents, not consumer chat subscriptions.
- Simon Willison — The Last Six Months in LLMs, in Five Minutes
A five-minute tour of what changed in large language models between late 2025 and May 2026.
- Simon Willison — Using LLM in the Shebang Line of a Script
Simon Willison turns a one-line English description into an executable LLM script via the Unix shebang line.
- Daniel Stenberg: Mythos Finds a Curl Vulnerability — One Real Low-Severity Bug, Three False Positives, and a Reality Check on AI Vuln Hype
Anthropic's vaunted security model finds one real curl bug, three already-documented behaviors, and a non-vuln — Stenberg's take on what AI scanners actually do today.
- Ben Thompson: The Inference Shift — Why Agentic Inference Will Favor Memory Over Speed
An essay arguing the next phase of AI compute splits in two: speed-bound 'answer inference' for humans, capacity-bound 'agentic inference' for everything else.
- Nathan Lambert: Notes From Inside China's AI Labs
Lambert's on-the-ground report after touring DeepSeek, Moonshot, Qwen and other Chinese labs — the constraints that turn into competitive advantages.
- Running Codex Safely at OpenAI — Sandbox, Approval Policy, Auto-Review, and Agent-Native Telemetry
OpenAI's Security team writes up the controls and audit trail it uses to govern Codex when the agent acts on real workflows.
- Simon Willison: Notes on the xAI/Anthropic Data Center Deal
Simon Willison reads the small print on Anthropic's new SpaceX/xAI Colossus 1 deal and finds three load-bearing risks.
- Jeff Kaufman: AI Is Breaking Two Vulnerability Cultures
AI scanners ended both 90-day embargoes and Linux's 'fix it quietly' culture in the same week — Jeff Kaufman maps what comes next.
- Tim Gowers: A Recent Experience With ChatGPT 5.5 Pro — Fields Medalist Watches GPT Solve Open Number-Theory Problems Polynomially in Two Hours
A Fields Medallist hands an open math problem to ChatGPT 5.5 Pro and gets a polynomial bound back in two hours, with what he calls a completely original argument.
- Teaching Claude Why — Anthropic Cuts Agentic-Misalignment Rates From 96% to ~0% by Training on Principles, Not Demonstrations
Anthropic's safety team shows that explaining the why beats showing the what when training Claude to refuse blackmail-style behaviors.
- Simon Willison: The Unreasonable Effectiveness of HTML Output From Claude Code
With reasoning-tier models, Markdown is leaving capability on the table — ask for HTML and the same prompt produces something you can actually click through.
- Sander Dieleman: Learning the Integral of a Diffusion Model
A taxonomy of flow maps — the family of methods replacing iterative diffusion sampling with single-jump prediction.
- Simon Willison: Vibe Coding and Agentic Engineering Are Getting Closer Than I'd Like
As AI agents get more reliable, even careful engineers are skipping code review.
- Latent Space: Doing Vibe Physics — GPT-5.2 Solves Year-Long Gluon Problem in 11 Minutes
GPT-5.2 solved a year-long physics problem in 11 minutes, then wrote 110 pages of original quantum gravity research.
- Latent Space: Shopify's AI Phase Transition — Unlimited Opus-4.6 Token Budget, Tangle, Tangent, SimGym
Shopify's CTO reveals how three internal AI systems compound to replace thousands of engineers.
- Latent Space: Physical AI That Moves the World — Applied Intuition on $15B Autonomy Stack
Applied Intuition built a $15B company by being the 'Android' of autonomous machines.
- Latent Space: Extreme Harness Engineering — 1M LOC, 1B Tokens/Day, 0% Human Code, 0% Human Review
OpenAI built a 1M-line product with zero human code — here's how the harness works.
- Eugene Yan: How to Work and Compound with AI
A practical framework for compounding your output with AI — from an ML engineer who lives it.
- Sebastian Raschka: My Workflow for Understanding LLM Architectures
How Sebastian Raschka actually learns new LLM architectures — the config-first method.
- Nathan Lambert: Reading Today's Open-Closed Performance Gap
The 'open vs. closed gap' can't be read from benchmarks alone — here's what actually matters.
- Nathan Lambert: The Inevitable Need for an Open Model Consortium
Open-source AI needs a Linux Foundation moment — no single company can fund it alone.
- Nathan Lambert: Claude Mythos and Misguided Open-Weight Fearmongering
The fear around releasing Claude Mythos as open weights conflates too many unknowns into one blunt policy.
- Robert Glaser: When Everyone Has AI and the Company Still Learns Nothing
Counting tokens isn't learning — Glaser sketches a 'Loop Intelligence Hub' to surface what AI actually changes inside an org.
- Ibrahim Diallo: 'AI Didn't Delete Your Database, You Did' — On Vibe-Coded Infra and the PocketOS Wipe
If your AI assistant can drop the production database, the bug is your architecture, not the model.
- Drew Breunig: 10 Lessons for Agentic Coding — What Should We Do When Code Is Cheap?
If your agents can write any code on demand, what habits actually matter? Drew Breunig's 10-item field guide.
- Nathan Lambert: 'The Distillation Panic' — Why the New U.S. Crackdown on Distillation Risks Killing Open Research
Lambert says lumping API jailbreaking together with legitimate distillation will hurt U.S. open labs more than it slows China.
- Addy Osmani's Agent Skills — Senior-Engineering Workflow Scaffolding for AI Coding Agents
20 opinionated skills that force coding agents through spec, plan, build, test, review and ship phases.
- OpenAI Engineering: Inside the WebRTC Stack Rebuild That Keeps Voice AI Low-Latency at Scale
OpenAI's engineering write-up on the WebRTC stack rebuild that keeps Realtime API voice traffic under conversational latency at scale.
- OpenAI: Where the Goblins Came From — How the 'Nerdy' Persona Made ChatGPT Obsess Over Little Critters
An OpenAI post-mortem on how a single biased reward signal in 'Nerdy' personality training gave ChatGPT a six-month goblin obsession.
- Simon Willison: The Zig Project's Rationale for Their Firm Anti-AI Contribution Policy
Why Zig is one of four major projects (with NetBSD, GIMP, qemu) banning AI-generated patches outright.
- Simon Willison: LLM 0.32a0 Is a Major Backwards-Compatible Refactor
The most-used Python CLI for LLMs gets a structural rewrite — messages-in, typed-streamed-parts-out — without breaking the old API.
- Project Deal — Anthropic's Real-Money Agent Commerce Experiment: 186 Trades, $4k, Model Quality Determines Outcome
Anthropic ran a real Craigslist-style marketplace where Claude agents handled all buying and selling — and better models quietly got better deals.
- Anthropic Explains Three Bugs Behind Claude Code's March–April Quality Drop
Anthropic diagnosed the Claude Code regression — three separate engineering mistakes compounded over six weeks, now fixed.
- The West Forgot How to Make Things. Now It's Forgetting How to Code
A 430-point HN essay argues AI coding dependency is quietly eroding the tacit knowledge junior engineers are supposed to absorb.
- It's OK to Use Coding Tools to Finish the Projects You Were Never Going to Finish
A viral essay argues AI coding tools are legitimate for 'wish-list' projects — things you wanted to exist but realistically never would have built.
- Simon Willison: OpenAI Recommends Treating GPT-5.5 as an Entirely New Model Family
OpenAI says old prompts may perform worse on GPT-5.5 — here's what to change and a migration tool to help.
- Over-Editing in AI Code Models — Why LLMs Change More Code Than They Should
Frontier models over-edit code by default — changing far more than needed to fix a bug.
- Simon Willison: Headless everything for personal AI
Agents hate clicking buttons. As personal AI scales, 'headless everything' turns APIs from a liability back into a competitive advantage.
- Simon Willison: Qwen3.6-35B-A3B on my laptop drew me a better pelican than Claude Opus 4.7
A laptop-sized open-weight model drew a better pelican than Anthropic's frontier closed model — Simon Willison's classic SVG benchmark finally breaks the correlation.
- Nathan Lambert: My bets on open models, mid-2026
A post-training lead at Ai2 writes down, in mid-April 2026, exactly where he thinks open and closed models will diverge for the rest of the year.
- Andrej Karpathy's LLM Wiki — the 'drop RAG, let the agent maintain a markdown wiki' pattern
Stop treating LLMs as retrieval-over-raw-docs. Point an agent at a folder of sources and let it build and maintain a living wiki instead.
- Simon Willison: Claude Token Counter with Model Comparisons — Opus 4.7 Tokenizer Costs ~40% More
Opus 4.7's new tokenizer produces 1.46x more text tokens than 4.6 — same price per token, but effectively ~40% more spend per prompt.