Print-friendly view · use your browser's Save as PDF option (Cmd/Ctrl-P) to attach this article to a brief.
MATH (Hendrycks)
MATH · math benchmark · 2021
Source: https://policywindow.org/wiki/math-benchmark
Generated 2026-05-30T22:07:44 UTC
Summary
12,500 competition-math problems from AMC, AIME, etc. Evaluates step-by-step reasoning + final-answer accuracy.
At a glance
- Score range
- 0–100 % accuracy
- Contamination risk
- medium
- Methodology URL
- https://arxiv.org/abs/2103.03874
- Saturation status
- saturated
Details
Frontier reasoning models 90%+. AIME-2024 is the harder successor for unsaturated math eval.
How to cite this article
APA
Policy Window. (2021). MATH (Hendrycks) [Wiki article — Benchmark]. https://policywindow.org/wiki/math-benchmark
Chicago
Policy Window. 2021. "MATH (Hendrycks)." Wiki article (Benchmark). https://policywindow.org/wiki/math-benchmark.
Harvard
Policy Window (2021) 'MATH (Hendrycks)', Wiki article — Benchmark, available at: https://policywindow.org/wiki/math-benchmark.
OSCOLA
Policy Window, 'MATH (Hendrycks)' (Wiki article — Benchmark, 2021) <https://policywindow.org/wiki/math-benchmark> accessed [date].
BibTeX
@misc{policywindow-math-benchmark,
title = {MATH (Hendrycks)},
author = {Policy Window},
year = {2021},
howpublished = {MATH (2021)},
url = {https://policywindow.org/wiki/math-benchmark},
note = {Primary source: https://arxiv.org/abs/2103.03874}
}