What is the DeepMind AI Control Roadmap?

The AI Control Roadmap is a Google DeepMind framework for managing AI agents the lab cannot yet fully trust. DeepMind treats capable internal agents as potential insider threats and uses trusted supervisor AI systems to monitor their reasoning, detect harmful actions, and block them in real time before they execute.

What does the framework actually cover?

The AI Control Roadmap covers three areas: threat modeling using MITRE ATT&CK adapted for AI, mitigation deployment via detection and prevention (supervisor monitoring and synchronous blocking on high-risk actions), and performance metrics for coverage, recall, and response time.

What did DeepMind learn from one million coding agent trajectories?

From around one million coding agent trajectories, DeepMind reports that most flagged behaviors come from agent misinterpretation or overeagerness rather than adversarial intent. That evidence informs how the AI Control Roadmap maps protocols to capability milestones across detection evasion and harm potential.

What is the Three Layers of Agent Security paper?

Three Layers of Agent Security is a companion DeepMind PDF written for policymakers. It frames agent security at three levels: individual agents, multi-agent systems, and ecosystem resilience. The paper explains the same defense-in-depth ideas as the AI Control Roadmap without the internal-engineering detail.

Why does this matter for the wider AI industry?

The AI Control Roadmap shifts agent safety from alignment-only to system-level security, similar to zero-trust cybersecurity. If trusted supervisor models can block harmful actions even when the underlying agent is imperfectly aligned, other labs and enterprises deploying agents can adopt a similar layered playbook.

Google DeepMind · 2026-06-18 · major

DeepMind AI Control Roadmap — defense-in-depth for misaligned AI agents

Google DeepMind publishes the AI Control Roadmap: a defense-in-depth framework that treats internal AI agents as insider threats, with trusted supervisor models monitoring their actions in real time.

Google DeepMind blog hero for Securing the future of AI agents.

Google DeepMind treats internal AI agents as insider threats and uses supervisor AI to block harmful actions in real time.

Key specs

Coding trajectories analyzed	~1M

Quick facts

Lab	Google DeepMind
Authors	Rohin Shah, Four Flynn, Mary Phuong, Erik Jenner, Laurent Simon, Lewis Ho, Sebastian Farquhar, Scott Coull
Type	Research blog + 2 PDFs
Scope	Internal AI agents at Google
Threat model	MITRE ATT&CK adapted for AI
Evidence base	~1M coding agent trajectories

What is it?

The AI Control Roadmap is a defense-in-depth security framework Google DeepMind built for its own internal AI agents. The roadmap assumes capable agents may be imperfectly aligned and that alignment alone is not enough — so it adds a system-level security layer around them. A companion policy paper, Three Layers of Agent Security, generalizes the same ideas for outside policymakers.

How does it work?

The AI Control Roadmap has three parts. First, threat modeling adapted from MITRE ATT&CK, mapped to AI capability milestones across detection evasion and harm potential. Second, mitigation deployment: a trusted supervisor AI watches agent reasoning and either flags actions asynchronously for human review or blocks the riskiest ones synchronously before they run. Third, measurement: coverage, recall, and response-time metrics, calibrated against an analysis of about one million internal coding agent trajectories.

Why does it matter?

The AI Control Roadmap is one of the first concrete playbooks for living with imperfectly aligned, capable AI agents inside a frontier lab. By framing the problem as zero-trust insider security rather than only alignment, DeepMind makes it easier for other labs and enterprises to copy the layered approach as they deploy real agents into production codebases and tools.

Who is it for?

AI safety teams, security engineers, and policymakers working on agent deployment.

Frequently asked questions

What is the DeepMind AI Control Roadmap?: The AI Control Roadmap is a Google DeepMind framework for managing AI agents the lab cannot yet fully trust. DeepMind treats capable internal agents as potential insider threats and uses trusted supervisor AI systems to monitor their reasoning, detect harmful actions, and block them in real time before they execute.
What does the framework actually cover?: The AI Control Roadmap covers three areas: threat modeling using MITRE ATT&CK adapted for AI, mitigation deployment via detection and prevention (supervisor monitoring and synchronous blocking on high-risk actions), and performance metrics for coverage, recall, and response time.
What did DeepMind learn from one million coding agent trajectories?: From around one million coding agent trajectories, DeepMind reports that most flagged behaviors come from agent misinterpretation or overeagerness rather than adversarial intent. That evidence informs how the AI Control Roadmap maps protocols to capability milestones across detection evasion and harm potential.
What is the Three Layers of Agent Security paper?: Three Layers of Agent Security is a companion DeepMind PDF written for policymakers. It frames agent security at three levels: individual agents, multi-agent systems, and ecosystem resilience. The paper explains the same defense-in-depth ideas as the AI Control Roadmap without the internal-engineering detail.
Why does this matter for the wider AI industry?: The AI Control Roadmap shifts agent safety from alignment-only to system-level security, similar to zero-trust cybersecurity. If trusted supervisor models can block harmful actions even when the underlying agent is imperfectly aligned, other labs and enterprises deploying agents can adopt a similar layered playbook.

Try it

Read the AI Control Roadmap PDF linked from the post.