Compute vs Behavioural Capability Thresholds

Policy Window Editorial Board

Compute vs Behavioural Capability Thresholds

compute-vs-behavioral-threshold

Debate

Should the regulatory trigger for 'frontier' / 'foundation' / 'systemic-risk' status be training-compute thresholds (objective + ex-ante observable), or behavioural-capability evaluation (more semantically meaningful but operationally costly)?

Why does this debate matter for AI governance?

Determines which models the most-stringent obligations attach to. EU AIA + CA-SB-1047 + US EO 14110 all chose compute. Anthropic RSP + OpenAI Preparedness + DeepMind FSF all use behavioural-capability tiers. The divergence shapes capability-elicitation methodology, compliance cost, and the rate at which algorithmic-efficiency improvements drift models below threshold.

What are the competing positions?

This debate catalogues 3 competing positions.

Catalogued in editorial order; not ranked. Each position carries its own primary sources.

Position 1

Compute thresholds are operationally tractable

Compute is objectively observable (FLOPs reported, hardware tracked); behavioural-eval results depend on elicitation methodology that varies across labs. A compute threshold can be enforced consistently; a behavioural threshold cannot. The threshold's drift cost (algorithmic efficiency) is acceptable for medium-term horizons.

Proponents

European AI Office
EU AI Act drafting team

Primary sources

EU AI Act Annex XIII (2024)

Position 2

Behavioural-capability thresholds are semantically meaningful

Compute is at best a noisy proxy for capability; algorithmic efficiency improvements break the proxy. Behavioural-capability evaluation (CBRN uplift, autonomous replication, scheming, persuasion) directly measures the governance-relevant property. Operational cost is real but justified by accuracy.

Proponents

Primary sources

Position 3

Both — hybrid threshold

Use compute as a presumption trigger (catches the obvious cases without high enforcement cost) PLUS behavioural-capability designation power (catches efficiency-improved models below compute threshold). EU AIA Art. 51(1)(b) is one operationalisation; a hybrid CA-SB-1047 revision was discussed pre-veto.

Proponents

Primary sources

EU AI Act Art. 51(1)(b) — explicit-designation pathway distinct from compute-presumption (2024)

Which instruments, topics, and concepts does this debate touch?

It connects to 4 instruments, 3 topics, 4 concepts in the catalog.

Instruments shaped by this debate

Topics this debate touches

What is the editorial assessment?

EU is the only major instrument with a formal designation pathway (Art. 51(1)(b)) on top of compute presumption — the closest implementation of the hybrid position. Industry frameworks remain purely behavioural; CA-SB-1047 was purely compute. Convergence toward hybrid is plausible but not yet observed.