FrontierMath
FRONTIER-MATH · Mathematical reasoning
FrontierMath is a mathematical reasoning benchmark published in 2024 measuring hundreds of original research-mathematician-curated math problems requiring deep reasoning. Held-out evaluation only. Contamination risk: low.
What this benchmark measures
Hundreds of original research-mathematician-curated math problems requiring deep reasoning. Held-out evaluation only.
Epoch AI eval. Top reasoning models 2-5% at launch; OpenAI o3-preview reported 25% under custom harness.
Claimed scores
No claims have been recorded yet for this benchmark in the Policy Window catalog.
Interpretation guidance
Contamination risk: low
Benchmark items are unlikely to appear in training corpora — scores are credible reflections of underlying capability.
How to cite this benchmark
Use the primary methodology source for academic citations; reference the Policy Window article for the cross-model leaderboard.
- Primary methodology:https://epochai.org/frontiermath
- Wiki article:
https://policywindow.org/wiki/frontiermath
Related benchmarks (mathematical reasoning)
- MATH (Hendrycks)· 2021 · medium contamination
- AIME 2024· 2024 · low contamination
References
Take this further — sign up free
Save, compare, or get alerts when FrontierMath changes. Policy Window is the analyst workbench layered on top of this wiki — free for researchers, civil society, and verified policymakers.