FrontierMath

FRONTIER-MATH · Mathematical reasoning

Live · 2024

FrontierMath is a mathematical reasoning benchmark published in 2024 measuring hundreds of original research-mathematician-curated math problems requiring deep reasoning. Held-out evaluation only. Contamination risk: low.

What this benchmark measures

Hundreds of original research-mathematician-curated math problems requiring deep reasoning. Held-out evaluation only.

Epoch AI eval. Top reasoning models 2-5% at launch; OpenAI o3-preview reported 25% under custom harness.

Claimed scores

No claims have been recorded yet for this benchmark in the Policy Window catalog.

Interpretation guidance

Contamination risk: low

Benchmark items are unlikely to appear in training corpora — scores are credible reflections of underlying capability.

How to cite this benchmark

Use the primary methodology source for academic citations; reference the Policy Window article for the cross-model leaderboard.

Related benchmarks (mathematical reasoning)

References

  1. FrontierMath methodology

Take this further — sign up free

Save, compare, or get alerts when FrontierMath changes. Policy Window is the analyst workbench layered on top of this wiki — free for researchers, civil society, and verified policymakers.

Generated from the Policy Window catalog at . Each claim cites the originating primary source.

Wiki articles regenerate when the underlying catalog updates. Tracked revisions arrive in a future iteration; subscribe via the CTA above to be notified when this article changes.