The Epistemic Burden-of-Proof Problem

Policy Window Editorial Board

The Epistemic Burden-of-Proof Problem

Open problem 3

Open problem

Under deep uncertainty, who bears the burden of proof: developers to show sufficient safety, or regulators to show sufficient danger?

Why is this problem foundational?

Why it’s foundational

AI governance is full of implicit evidentiary standards. “Safe enough,” “dangerous capability,” “systemic risk,” “effective mitigation,” and “intolerable risk” are not self-interpreting.

Why it’s difficult

The most serious risks may be rare, anticipatory, adversarial, or unprecedented. Waiting for confirmed harm may be irresponsible; acting before harm may be politically illegitimate or economically costly.

Hidden assumptions

The dominant discourse often assumes that better evaluations will solve the proof problem. That is doubtful. Evaluations can inform judgement, but they cannot by themselves determine acceptable risk.

What approaches have been proposed?

7 competing positions are catalogued.

Competing positions

Precautionary principle
Pro-innovation/proactionary principle
Cost-benefit analysis
National-security risk management
Safety-case regulation
Liability-after-harm
Moratorium until affirmative safety

What could make progress

Decision theory for irreversible AI risks; calibrated forecasting tournaments; historical comparison with aviation, nuclear, pharmaceuticals, cybersecurity, and finance; structured “safety case” standards; empirical work on false positives and false negatives in frontier evaluations.

What it would change

It would decide whether frontier AI uses pre-market approval, post-market monitoring, strict liability, mandatory incident reporting, safety cases, or permissive deployment with ex post remedies.

What sub-questions need research?

The research agenda breaks this into 5 sub-questions.

Sub-agenda

What evidentiary standard should apply to catastrophic AI risk?
How should regulators treat uncertainty about low-probability, high-severity harms?
What is the right burden of proof for open-weight frontier releases?
Can safety cases be made adversarially robust?
How should governance distinguish absence of evidence from evidence of absence?

Priority (editor scoring)

Every serious policy choice depends on what counts as enough evidence.

Importance: 5/5
Neglected: 4/5
Difficulty: 5/5
Actionable: 5/5
Robust: 5/5
Nat’l+int’l: 5/5

Where does the catalog bear on this problem?

No current catalog instrument resolves this puzzle — which is the point: it is a foundational question the existing rules leave open. Browse the coverage catalog for what the instruments do and don’t say.

Editorial content — a human-authored agenda question, rendered verbatim. No part of this analysis is AI-generated (see the charter).