Print-friendly view · use your browser's Save as PDF option (Cmd/Ctrl-P) to attach this article to a brief.

HumanEval

HUMANEVAL · code benchmark · 2021

Source: https://policywindow.org/wiki/humaneval

Generated 2026-05-30T22:07:42 UTC

Summary

164 hand-written Python programming problems. Generate a function that passes provided unit tests.

At a glance

Score range: 0–100 pass@1 %
Contamination risk: high
Methodology URL: https://arxiv.org/abs/2107.03374
Saturation status: deprecated

Details

Saturated — top models ~95%. Largely superseded by SWE-bench for real-world relevance.

How to cite this article

APA

Policy Window. (2021). HumanEval [Wiki article — Benchmark]. https://policywindow.org/wiki/humaneval

Chicago

Policy Window. 2021. "HumanEval." Wiki article (Benchmark). https://policywindow.org/wiki/humaneval.

Harvard

Policy Window (2021) 'HumanEval', Wiki article — Benchmark, available at: https://policywindow.org/wiki/humaneval.

OSCOLA

Policy Window, 'HumanEval' (Wiki article — Benchmark, 2021) <https://policywindow.org/wiki/humaneval> accessed [date].

BibTeX

@misc{policywindow-humaneval,
  title  = {HumanEval},
  author = {Policy Window},
  year   = {2021},
  howpublished = {HUMANEVAL (2021)},
  url    = {https://policywindow.org/wiki/humaneval},
  note   = {Primary source: https://arxiv.org/abs/2107.03374}
}