Pre-Deployment Red-Team vs Post-Deployment Audit

Policy Window Editorial Board

Pre-Deployment Red-Team vs Post-Deployment Audit

pre-vs-post-deployment-eval

Debate

Should AI capability + safety evaluations happen primarily before deployment (red-team gating release), or primarily after (post-deployment audit + incident response)?

Why does this debate matter for AI governance?

EU AI Act Art. 55 emphasises pre-deployment systemic-risk evaluation. US EO 14110 §4.2(a)(i) required both. Industry RSPs (Anthropic, OpenAI, DeepMind) all gate deployment on pre-deployment evals. But post-deployment monitoring catches incidents pre-deployment evals miss (jailbreaks discovered post-release, capability-elicitation surprises, downstream-application harms). The dispute is operational: where does the evaluation budget go?

What are the competing positions?

This debate catalogues 3 competing positions.

Catalogued in editorial order; not ranked. Each position carries its own primary sources.

Position 1

Pre-deployment red-team is the load-bearing safeguard

Catastrophic-risk capabilities (CBRN uplift, autonomous replication) must be identified BEFORE deployment because post-deployment correction is too slow. Pre-deployment red-team + capability-evaluation gating is the only mechanism with the timing-property catastrophic risks require.

Proponents

Primary sources

EU AI Act Art. 55(1)(a) (adversarial testing) (2024)

Position 2

Post-deployment audit + incident-response is sufficient + adaptive

Pre-deployment red-team has fundamental limitations: limited time + budget, evaluator skill ceiling, distribution shift from eval to deployment, inability to anticipate downstream applications. A robust post-deployment audit + incident-response regime (analogous to pharmacovigilance) catches more in practice and adapts to actual-deployment failure modes.

Proponents

Primary sources

Raji, I., et al. (2020), 'Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing' (2020)

Position 3

Both, with pre-deployment for catastrophic + post-deployment for non-catastrophic

Tiered approach: catastrophic-risk capability evaluations must happen pre-deployment (timing-property argument applies); non-catastrophic harms (bias, fairness, application-specific failures) are better addressed via post-deployment audit + incident response.

Proponents

G7 Hiroshima Process actors
Frontier Foundation Model Eval Consortium

Primary sources

G7 Hiroshima AI Process Code of Conduct §1 + §6 (2023)

Which instruments, topics, and concepts does this debate touch?

It connects to 5 instruments, 4 topics, 4 concepts in the catalog.

Instruments shaped by this debate

Topics this debate touches

What is the editorial assessment?

The tiered position is becoming the de facto policy convergence post-Seoul 2024, though tier-thresholds remain disputed.