Print-friendly view · use your browser's Save as PDF option (Cmd/Ctrl-P) to attach this article to a brief.
MMLU
MMLU · general reasoning benchmark · 2020
Source: https://policywindow.org/wiki/mmlu
Generated 2026-05-30T22:09:01 UTC
Summary
Massive Multitask Language Understanding — 57-subject multiple-choice covering humanities, STEM, social sciences, professional/legal.
At a glance
- Score range
- 0–100 % accuracy
- Contamination risk
- high
- Methodology URL
- https://arxiv.org/abs/2009.03300
- Saturation status
- saturated
Details
Saturating — top models ~92%. Test-set leakage to training corpora is widely documented. MMLU-Pro is the harder successor.
How to cite this article
APA
Policy Window. (2020). MMLU [Wiki article — Benchmark]. https://policywindow.org/wiki/mmlu
Chicago
Policy Window. 2020. "MMLU." Wiki article (Benchmark). https://policywindow.org/wiki/mmlu.
Harvard
Policy Window (2020) 'MMLU', Wiki article — Benchmark, available at: https://policywindow.org/wiki/mmlu.
OSCOLA
Policy Window, 'MMLU' (Wiki article — Benchmark, 2020) <https://policywindow.org/wiki/mmlu> accessed [date].
BibTeX
@misc{policywindow-mmlu,
title = {MMLU},
author = {Policy Window},
year = {2020},
howpublished = {MMLU (2020)},
url = {https://policywindow.org/wiki/mmlu},
note = {Primary source: https://arxiv.org/abs/2009.03300}
}