Open problem 3
The Epistemic Burden-of-Proof Problem
- current AI
- frontier AI
- AGI
Under deep uncertainty, who bears the burden of proof: developers to show sufficient safety, or regulators to show sufficient danger?
Why it’s foundational
AI governance is full of implicit evidentiary standards. “Safe enough,” “dangerous capability,” “systemic risk,” “effective mitigation,” and “intolerable risk” are not self-interpreting.
Why it’s difficult
The most serious risks may be rare, anticipatory, adversarial, or unprecedented. Waiting for confirmed harm may be irresponsible; acting before harm may be politically illegitimate or economically costly.
Hidden assumptions
The dominant discourse often assumes that better evaluations will solve the proof problem. That is doubtful. Evaluations can inform judgement, but they cannot by themselves determine acceptable risk.
Competing positions
- Precautionary principle
- Pro-innovation/proactionary principle
- Cost-benefit analysis
- National-security risk management
- Safety-case regulation
- Liability-after-harm
- Moratorium until affirmative safety
What could make progress
Decision theory for irreversible AI risks; calibrated forecasting tournaments; historical comparison with aviation, nuclear, pharmaceuticals, cybersecurity, and finance; structured “safety case” standards; empirical work on false positives and false negatives in frontier evaluations.
What it would change
It would decide whether frontier AI uses pre-market approval, post-market monitoring, strict liability, mandatory incident reporting, safety cases, or permissive deployment with ex post remedies.
Sub-agenda
- What evidentiary standard should apply to catastrophic AI risk?
- How should regulators treat uncertainty about low-probability, high-severity harms?
- What is the right burden of proof for open-weight frontier releases?
- Can safety cases be made adversarially robust?
- How should governance distinguish absence of evidence from evidence of absence?
Priority (editor scoring)
Every serious policy choice depends on what counts as enough evidence.
- Importance
- 5/5
- Neglected
- 4/5
- Difficulty
- 5/5
- Actionable
- 5/5
- Robust
- 5/5
- Nat’l+int’l
- 5/5
Where the catalog bears on this
No current catalog instrument resolves this puzzle — which is the point: it is a foundational question the existing rules leave open. Browse the coverage catalog for what the instruments do and don’t say.
Editorial content — a human-authored agenda question, rendered verbatim. No part of this analysis is AI-generated (see the charter).