Compute vs Behavioural Capability Thresholds
compute-vs-behavioral-threshold · AI-governance meta-debate
Should the regulatory trigger for 'frontier' / 'foundation' / 'systemic-risk' status be training-compute thresholds (objective + ex-ante observable), or behavioural-capability evaluation (more semantically meaningful but operationally costly)?
Why it matters
Determines which models the most-stringent obligations attach to. EU AIA + CA-SB-1047 + US EO 14110 all chose compute. Anthropic RSP + OpenAI Preparedness + DeepMind FSF all use behavioural-capability tiers. The divergence shapes capability-elicitation methodology, compliance cost, and the rate at which algorithmic-efficiency improvements drift models below threshold.
Positions (3)
Catalogued in editorial order; not ranked. Each position carries its own primary sources.
Position 1
Compute thresholds are operationally tractable
Compute is objectively observable (FLOPs reported, hardware tracked); behavioural-eval results depend on elicitation methodology that varies across labs. A compute threshold can be enforced consistently; a behavioural threshold cannot. The threshold's drift cost (algorithmic efficiency) is acceptable for medium-term horizons.
Proponents
- European AI Office
- EU AI Act drafting team
Primary sources
- EU AI Act Annex XIII (2024)
Position 2
Behavioural-capability thresholds are semantically meaningful
Compute is at best a noisy proxy for capability; algorithmic efficiency improvements break the proxy. Behavioural-capability evaluation (CBRN uplift, autonomous replication, scheming, persuasion) directly measures the governance-relevant property. Operational cost is real but justified by accuracy.
Proponents
Primary sources
Position 3
Both — hybrid threshold
Use compute as a presumption trigger (catches the obvious cases without high enforcement cost) PLUS behavioural-capability designation power (catches efficiency-improved models below compute threshold). EU AIA Art. 51(1)(b) is one operationalisation; a hybrid CA-SB-1047 revision was discussed pre-veto.
Primary sources
Instruments shaped by this debate
Topics this debate touches
Editorial note
EU is the only major instrument with a formal designation pathway (Art. 51(1)(b)) on top of compute presumption — the closest implementation of the hybrid position. Industry frameworks remain purely behavioural; CA-SB-1047 was purely compute. Convergence toward hybrid is plausible but not yet observed.