An Approach to Technical AGI Safety & Security

Explore Google DeepMind's comprehensive framework for addressing safety and security concerns related to Artificial General Intelligence (AGI).

Comprehensive Framework

Key Areas of the AGI Safety & Security Approach

This interactive guide presents a technical approach to ensuring AGI systems are developed safely and securely, focusing on four risk areas and corresponding mitigation strategies.

Core Assumptions

Review the five fundamental assumptions about AGI development that underpin the safety approach.

Risk Areas

Explore the four key risk areas: misuse, misalignment, mistakes, and structural risks - with a focus on the first two as most critical for severe harm prevention.

Misuse Mitigation

Learn about approaches to prevent malicious actors from accessing and exploiting dangerous AI capabilities.

Misalignment Mitigation

Understand strategies to ensure AI systems remain aligned with human values and intentions, even as capabilities advance.

Executive Summary

Artificial General Intelligence (AGI) refers to highly autonomous systems that outperform humans at most economically valuable work and could match or exceed human-level performance across a wide range of tasks. AGI promises transformative benefits like raising living standards worldwide and accelerating scientific discovery, but also presents significant risks. The Google DeepMind report presents an "anytime" framework, meaning one that can be implemented at any stage, to address risks of severe harms from AGI, focusing primarily on misuse and misalignment, while acknowledging mistakes and structural risks.

For misuse, the strategy proactively identifies dangerous capabilities and prevents threat actors from accessing them through robust security, access restrictions, monitoring, and model safety mitigations. For misalignment, two lines of defense are outlined: first, training aligned models via amplified oversight and robust training; second, implementing system-level security measures to mitigate harm even from misaligned models.

The report's approach is informed by key assumptions about AGI development: continuing within the current paradigm, potentially exceeding human capabilities, developing on uncertain timelines, and exhibiting approximately continuous progress. Supporting techniques include interpretability, uncertainty estimation, and safer design patterns for building effective AGI safety cases.