Editorial standards
Quality rubric & live scorecard
"Journalistic quality with academic research practices", made measurable. Every published article is scored deterministically from the catalog against the rubric below — baseline guarantees PW makes, plus depth signals drawn from academic research practice. This page is the standard made auditable, not asserted; the same catalog always yields the same scores. Machine-readable at /wiki/editorial-standards.json.
Mean score by article type
instruments
88%
33 articles
topics
90%
23 articles
concepts
86%
30 articles
benchmarks
83%
10 articles
The rubric — catalog-wide pass rates
baseline = a guarantee PW makes (should pass for nearly every article). depth = an academic-practice signal that varies and surfaces where a piece is thin.
- depthEditorially reviewed within 365 days23%
- depthEvidence base spans ≥2 source types (not single-type / critique-only)97%
- baselineAnchored to a primary source99%
- depthHas an academic evidence base99%
- baselineOpens with a substantive lede100%
- baselineDrafting provenance disclosed (charter §7.9)100%
- depthConsensus/contestation assessed (and locus named if contested)100%
Where the gaps are
The lowest-scoring published articles, worst first — each with the specific criteria it fails. This is the editorial work queue, surfaced honestly.
Fails: anchored to a primary source (no citation anchor); has an academic evidence base (0 sources); evidence base spans ≥2 source types (not single-type / critique-only) (0 types)
- AI in Criminal Justice5/7 · 71%
Fails: evidence base spans ≥2 source types (not single-type / critique-only) (1 type); editorially reviewed within 365 days (no review date)
- Development-Rights Framings5/7 · 71%
Fails: evidence base spans ≥2 source types (not single-type / critique-only) (1 type); editorially reviewed within 365 days (no review date)
- AIME 20245/6 · 83%
Fails: editorially reviewed within 365 days (no review date)
Fails: editorially reviewed within 365 days (no review date)
- ARC-AGI v25/6 · 83%
Fails: editorially reviewed within 365 days (no review date)
Fails: editorially reviewed within 365 days (no review date)
Fails: editorially reviewed within 365 days (no review date)
- Bletchley Declaration on AI Safety5/6 · 83%
Fails: editorially reviewed within 365 days (no review date)
- Brazil AI Bill (PL 2338/2023)5/6 · 83%
Fails: editorially reviewed within 365 days (no review date)
Fails: editorially reviewed within 365 days (no review date)
Fails: editorially reviewed within 365 days (no review date)
Fails: editorially reviewed within 365 days (no review date)
- FrontierMath5/6 · 83%
Fails: editorially reviewed within 365 days (no review date)
Fails: editorially reviewed within 365 days (no review date)
Fails: editorially reviewed within 365 days (no review date)
- GPQA Diamond5/6 · 83%
Fails: editorially reviewed within 365 days (no review date)
- HumanEval5/6 · 83%
Fails: editorially reviewed within 365 days (no review date)
- Humanity's Last Exam5/6 · 83%
Fails: editorially reviewed within 365 days (no review date)
Fails: editorially reviewed within 365 days (no review date)
The rubric is intentionally conservative: a failed depth criterion marks an article as not-yet-upgraded, not wrong. Scores are derived from typed catalog fields (no LLM, no judgement call) so they are reproducible and contestable. Under charter §7.9 AI-assisted prose may close some of these gaps under named-editor review, with every claim still citing a primary source.