Structured adversarial probing of an AI model's capabilities and behaviour before deployment, designed to elicit failures that ordinary evaluation would miss.
Definition and scope
Red-team evaluation originated in cybersecurity (penetration testing) and was adapted to AI by the 2022 DEF CON Generative Red Team event and later codified in the 2023 White House voluntary commitments. EU AI Act Art. 55(1)(a) requires adversarial testing for general-purpose AI models with systemic risk. US EO 14110 §4.2(a)(i) required reporting of red-team results for foundation models above the compute threshold (rescinded under EO 14179). G7 Hiroshima Code §1 calls for 'adversarial testing prior to and throughout deployment.' Anthropic, OpenAI, and Google DeepMind each maintain internal red-team programs with public methodology disclosures. Governance disputes centre on: (1) WHO must red-team (provider, independent third-party, government); (2) WHAT capabilities are in scope (CBRN uplift, autonomous replication, election manipulation, etc.); (3) WHO sees the results (provider only, regulator under confidentiality, public); (4) WHAT triggers re-evaluation after deployment.
Used by these instruments
- EU AI Act· EU
- Executive Order 14110 on Safe, Secure, Trustworthy AI· US
- G7 Hiroshima AI Process Code of Conduct· G7
- UK Pro-Innovation Approach to AI Regulation (White Paper)· UK
- Anthropic Responsible Scaling Policy (RSP) v2· US
- OpenAI Preparedness Framework· US
- Google DeepMind Frontier Safety Framework· US
- Meta Frontier AI Framework· US
- UK-US AI Safety Institute Memorandum of Understanding· global
- White House Voluntary AI Commitments· US
- Singapore Model AI Governance Framework for Generative AI· SG
- Japan METI AI Guidelines for Business· JP
Related concepts
- Frontier-Tier AI— A categorical classification of AI models above certain capability or compute thresholds, indicating
- AI Safety Level 3 (ASL-3)— A capability-based risk tier in Anthropic's Responsible Scaling Policy denoting models with the pote
- Systemic Risk (AI)— A regulatory designation indicating that a general-purpose AI model poses risks of significant scale
- Designated Systemic-Risk Model— A general-purpose AI model that has been formally designated by the EU AI Office under Article 51(1)
Appears in topic articles
Editorial note
Distinguish from 'evaluation' (general benchmark-style measurement) and 'audit' (post-hoc third-party review). Red-teaming is specifically pre-deployment + adversarial-intent.
References
Take this further — sign up free
Save, compare, or get alerts when Red-Team Evaluation changes. Policy Window is the analyst workbench layered on top of this wiki — free for researchers, civil society, and verified policymakers.