Open problem 5
The Threshold and Tripwire Problem
- frontier AI
- transformative AI
- AGI
Which thresholds should trigger reporting, licensing, external evaluation, compute controls, restricted deployment, model-weight security requirements, or a pause?
Why it’s foundational
Without thresholds, governance is discretionary and reactive. With bad thresholds, governance is gameable, obsolete, or dangerously overbroad.
Why it’s difficult
Compute thresholds are measurable but imperfect proxies for risk. Capability thresholds are more relevant but harder to measure. Behavioural thresholds may appear only after deployment. Risk thresholds require normative judgement.
Hidden assumptions
Many proposals assume thresholds can be set once and updated smoothly. In reality, thresholds are political commitments under technical uncertainty. They create incentives to train just below, hide capabilities, manipulate evaluations, or shift development offshore.
Competing positions
- Compute-based thresholds
- Capability-based thresholds
- Risk-based thresholds
- Deployment-context thresholds
- Case-by-case regulator discretion
- Hard moratoria on specific capabilities
- No thresholds, only ex post liability
What could make progress
Empirical mapping between compute, architecture, data, post-training, scaffolding, and capabilities; legal work on threshold design; simulations of threshold gaming; audit pilots using confidential regulator access; comparative study of safety frameworks. The Seoul Frontier AI Safety Commitments already push developers toward severe-risk thresholds and processes for not developing or deploying systems if mitigations cannot keep risks below those thresholds; the unresolved problem is whether such thresholds can be made public, binding, legitimate, and enforceable.
What it would change
It would define the operational backbone of frontier AI regulation: who must notify whom, when, and what must stop if the threshold is crossed.
Sub-agenda
- Which capabilities are legitimate “red lines”?
- How should thresholds account for scaffolding and tool use?
- Can compute thresholds remain useful as algorithms become more efficient?
- What governance response should follow each threshold?
- Who has authority to update thresholds?
Priority (editor scoring)
Thresholds convert principles into action, but are gameable and politically loaded.
- Importance
- 5/5
- Neglected
- 3/5
- Difficulty
- 4/5
- Actionable
- 5/5
- Robust
- 4/5
- Nat’l+int’l
- 5/5
Where the catalog bears on this
Instruments this puzzle names that are in the catalog:
Editorial content — a human-authored agenda question, rendered verbatim. No part of this analysis is AI-generated (see the charter).