Humanity's Last Exam

HLE · Knowledge

Live · 2025

Humanity's Last Exam is a knowledge benchmark published in 2025 measuring 3,000+ frontier-difficulty expert-curated questions across all academic disciplines. Designed to remain unsaturated through 2026+. Contamination risk: low.

What this benchmark measures

3,000+ frontier-difficulty expert-curated questions across all academic disciplines. Designed to remain unsaturated through 2026+.

Center for AI Safety + Scale AI collaboration. Frontier models 8-22% at launch. Replaces MMLU as the de-facto knowledge ceiling.

Claimed scores

ModelScoreClaim typeReportedCitation
gpt-522.1 % accuracyvendor card2025-08-07OpenAI release

Interpretation guidance

Contamination risk: low

Benchmark items are unlikely to appear in training corpora — scores are credible reflections of underlying capability.

How to cite this benchmark

Use the primary methodology source for academic citations; reference the Policy Window article for the cross-model leaderboard.

  • Primary methodology:https://lastexam.ai/
  • Wiki article:https://policywindow.org/wiki/humanitys-last-exam

References

  1. Humanity's Last Exam methodology
  2. gpt-5 — 22.1 % accuracy (OpenAI release, 2025-08-07)

Take this further — sign up free

Save, compare, or get alerts when Humanity's Last Exam changes. Policy Window is the analyst workbench layered on top of this wiki — free for researchers, civil society, and verified policymakers.

Generated from the Policy Window catalog at . Each claim cites the originating primary source.

Wiki articles regenerate when the underlying catalog updates. Tracked revisions arrive in a future iteration; subscribe via the CTA above to be notified when this article changes.