Google DeepMind · 2026-06-18 · major
DeepMind AI Control Roadmap — defense-in-depth for misaligned AI agents
Google DeepMind publishes the AI Control Roadmap: a defense-in-depth framework that treats internal AI agents as insider threats, with trusted supervisor models monitoring their actions in real time.
Google DeepMind treats internal AI agents as insider threats and uses supervisor AI to block harmful actions in real time.
Key specs
| Coding trajectories analyzed | ~1M |
|---|
Quick facts
| Lab | Google DeepMind |
|---|---|
| Authors | Rohin Shah, Four Flynn, Mary Phuong, Erik Jenner, Laurent Simon, Lewis Ho, Sebastian Farquhar, Scott Coull |
| Type | Research blog + 2 PDFs |
| Scope | Internal AI agents at Google |
| Threat model | MITRE ATT&CK adapted for AI |
| Evidence base | ~1M coding agent trajectories |
What is it?
The AI Control Roadmap is a defense-in-depth security framework Google DeepMind built for its own internal AI agents. The roadmap assumes capable agents may be imperfectly aligned and that alignment alone is not enough — so it adds a system-level security layer around them. A companion policy paper, Three Layers of Agent Security, generalizes the same ideas for outside policymakers.
How does it work?
The AI Control Roadmap has three parts. First, threat modeling adapted from MITRE ATT&CK, mapped to AI capability milestones across detection evasion and harm potential. Second, mitigation deployment: a trusted supervisor AI watches agent reasoning and either flags actions asynchronously for human review or blocks the riskiest ones synchronously before they run. Third, measurement: coverage, recall, and response-time metrics, calibrated against an analysis of about one million internal coding agent trajectories.
Why does it matter?
The AI Control Roadmap is one of the first concrete playbooks for living with imperfectly aligned, capable AI agents inside a frontier lab. By framing the problem as zero-trust insider security rather than only alignment, DeepMind makes it easier for other labs and enterprises to copy the layered approach as they deploy real agents into production codebases and tools.
Who is it for?
AI safety teams, security engineers, and policymakers working on agent deployment.
Frequently asked questions
- What is the DeepMind AI Control Roadmap?
- The AI Control Roadmap is a Google DeepMind framework for managing AI agents the lab cannot yet fully trust. DeepMind treats capable internal agents as potential insider threats and uses trusted supervisor AI systems to monitor their reasoning, detect harmful actions, and block them in real time before they execute.
- What does the framework actually cover?
- The AI Control Roadmap covers three areas: threat modeling using MITRE ATT&CK adapted for AI, mitigation deployment via detection and prevention (supervisor monitoring and synchronous blocking on high-risk actions), and performance metrics for coverage, recall, and response time.
- What did DeepMind learn from one million coding agent trajectories?
- From around one million coding agent trajectories, DeepMind reports that most flagged behaviors come from agent misinterpretation or overeagerness rather than adversarial intent. That evidence informs how the AI Control Roadmap maps protocols to capability milestones across detection evasion and harm potential.
- What is the Three Layers of Agent Security paper?
- Three Layers of Agent Security is a companion DeepMind PDF written for policymakers. It frames agent security at three levels: individual agents, multi-agent systems, and ecosystem resilience. The paper explains the same defense-in-depth ideas as the AI Control Roadmap without the internal-engineering detail.
- Why does this matter for the wider AI industry?
- The AI Control Roadmap shifts agent safety from alignment-only to system-level security, similar to zero-trust cybersecurity. If trusted supervisor models can block harmful actions even when the underlying agent is imperfectly aligned, other labs and enterprises deploying agents can adopt a similar layered playbook.
Try it
Read the AI Control Roadmap PDF linked from the post.