Print-friendly view · use your browser's Save as PDF option (Cmd/Ctrl-P) to attach this article to a brief.
HumanEval
HUMANEVAL · code benchmark · 2021
Source: https://policywindow.org/wiki/humaneval
Generated 2026-05-30T22:07:42 UTC
Summary
164 hand-written Python programming problems. Generate a function that passes provided unit tests.
At a glance
- Score range
- 0–100 pass@1 %
- Contamination risk
- high
- Methodology URL
- https://arxiv.org/abs/2107.03374
- Saturation status
- deprecated
Details
Saturated — top models ~95%. Largely superseded by SWE-bench for real-world relevance.
How to cite this article
APA
Policy Window. (2021). HumanEval [Wiki article — Benchmark]. https://policywindow.org/wiki/humaneval
Chicago
Policy Window. 2021. "HumanEval." Wiki article (Benchmark). https://policywindow.org/wiki/humaneval.
Harvard
Policy Window (2021) 'HumanEval', Wiki article — Benchmark, available at: https://policywindow.org/wiki/humaneval.
OSCOLA
Policy Window, 'HumanEval' (Wiki article — Benchmark, 2021) <https://policywindow.org/wiki/humaneval> accessed [date].
BibTeX
@misc{policywindow-humaneval,
title = {HumanEval},
author = {Policy Window},
year = {2021},
howpublished = {HUMANEVAL (2021)},
url = {https://policywindow.org/wiki/humaneval},
note = {Primary source: https://arxiv.org/abs/2107.03374}
}