occurs when a human deliberately uses the AI system to cause harm, against the developer's wishes. The approach described focuses on preventing bad actors from accessing dangerous capabilities.
Figure 2: Overview of the described approach to mitigating misuse
The approach aims to block bad actors' access to dangerous capabilities through various layers of protection.
Understanding Misuse Risk
AI could exacerbate harm from misuse by:
- Increased possibility of causing harm: AI could place the power of significant weapons expertise and a large workforce in the hands of its users, expanding the pool of individuals with the capability to cause severe harms.
- Decreased detectability: AI could assist with evading surveillance, reducing the probability that a bad actor is caught.
- Disrupting defenses: As a novel and quickly developing technology, AI models disturb the existing misuse equilibrium, requiring time for society to build appropriate controls.
- Automation at scale: AI systems may concentrate power in the hands of individuals who control it, enabling a single bad actor to cause harm at unprecedented scale.
Mitigation Approaches
Capability-Based Risk Assessment
Before implementing costly mitigations, it is first assessed whether the AI model has capabilities that could enable severe harm.
Model Deployment Mitigations
When models possess dangerous capabilities, techniques are applied to prevent misuse:
Monitoring
Monitoring systems detect when a threat actor attempts to inappropriately access dangerous capabilities, and respond to prevent them from causing harm.
Access Restrictions
Access to models with dangerous capabilities can be restricted to vetted user groups and use cases, reducing the surface area that an actor can attempt to exploit.
Securing Model Weights
Security measures aim to prevent bad actors from stealing AI model weights, which could allow them to bypass deployment mitigations.
Red-Teaming Mitigations
Red-teaming evaluates whether the described mitigations are sufficient by attempting to bypass them.
Safety Cases
The mitigations described above are critical components for addressing misuse risks, but they don't exist in isolation. To make deployment decisions, a structured way is needed to determine if these mitigations collectively provide sufficient protection against risks. provide a framework for assessing whether our mitigation strategies are comprehensive enough to justify deployment decisions. They help transform individual mitigations into coherent evidence that a system poses acceptable risk.
Safety cases for misuse typically fall into two categories: (demonstrating the model lacks dangerous capabilities) and (demonstrating mitigations are robust against sophisticated attacks).