Pre-Deployment Red-Team vs Post-Deployment Audit
pre-vs-post-deployment-eval · AI-governance meta-debate
Should AI capability + safety evaluations happen primarily before deployment (red-team gating release), or primarily after (post-deployment audit + incident response)?
Why it matters
EU AI Act Art. 55 emphasises pre-deployment systemic-risk evaluation. US EO 14110 §4.2(a)(i) required both. Industry RSPs (Anthropic, OpenAI, DeepMind) all gate deployment on pre-deployment evals. But post-deployment monitoring catches incidents pre-deployment evals miss (jailbreaks discovered post-release, capability-elicitation surprises, downstream-application harms). The dispute is operational: where does the evaluation budget go?
Positions (3)
Catalogued in editorial order; not ranked. Each position carries its own primary sources.
Position 1
Pre-deployment red-team is the load-bearing safeguard
Catastrophic-risk capabilities (CBRN uplift, autonomous replication) must be identified BEFORE deployment because post-deployment correction is too slow. Pre-deployment red-team + capability-evaluation gating is the only mechanism with the timing-property catastrophic risks require.
Primary sources
Position 2
Post-deployment audit + incident-response is sufficient + adaptive
Pre-deployment red-team has fundamental limitations: limited time + budget, evaluator skill ceiling, distribution shift from eval to deployment, inability to anticipate downstream applications. A robust post-deployment audit + incident-response regime (analogous to pharmacovigilance) catches more in practice and adapts to actual-deployment failure modes.
Proponents
Position 3
Both, with pre-deployment for catastrophic + post-deployment for non-catastrophic
Tiered approach: catastrophic-risk capability evaluations must happen pre-deployment (timing-property argument applies); non-catastrophic harms (bias, fairness, application-specific failures) are better addressed via post-deployment audit + incident response.
Proponents
- G7 Hiroshima Process actors
- Frontier Foundation Model Eval Consortium
Primary sources
Instruments shaped by this debate
Topics this debate touches
Editorial note
The tiered position is becoming the de facto policy convergence post-Seoul 2024, though tier-thresholds remain disputed.