Humanity's Last Exam
HLE · Knowledge
Humanity's Last Exam is a knowledge benchmark published in 2025 measuring 3,000+ frontier-difficulty expert-curated questions across all academic disciplines. Designed to remain unsaturated through 2026+. Contamination risk: low.
What this benchmark measures
3,000+ frontier-difficulty expert-curated questions across all academic disciplines. Designed to remain unsaturated through 2026+.
Center for AI Safety + Scale AI collaboration. Frontier models 8-22% at launch. Replaces MMLU as the de-facto knowledge ceiling.
Claimed scores
| Model | Score | Claim type | Reported | Citation |
|---|---|---|---|---|
| gpt-5 | 22.1 % accuracy | vendor card | 2025-08-07 | OpenAI release |
Interpretation guidance
Contamination risk: low
Benchmark items are unlikely to appear in training corpora — scores are credible reflections of underlying capability.
How to cite this benchmark
Use the primary methodology source for academic citations; reference the Policy Window article for the cross-model leaderboard.
- Primary methodology:https://lastexam.ai/
- Wiki article:
https://policywindow.org/wiki/humanitys-last-exam
References
Take this further — sign up free
Save, compare, or get alerts when Humanity's Last Exam changes. Policy Window is the analyst workbench layered on top of this wiki — free for researchers, civil society, and verified policymakers.