Skip to main content

Wiki · Reproducibility verdict policy

Reproducibility verdict policy

This policy is written beforePolicy Window ships any UI that issues public verdicts about whether a specific paper, study, or claim has been successfully reproduced. The assessment-fit workflow identified false-reproducibility accusations as a critical legal and reputational risk. The rules below are not aspirational — they are gated in code and pinned in the test suite.

1 · The headline rule

No public "failed-reproducibility" verdict on any specific paper, study, or claim without prior human replication review by a named reviewer with the relevant methodological expertise.

The risk this rule mitigates: an automated reproducibility system surfaces a verdict ("could not reproduce", "data unavailable", "environment failure") and that verdict gets published to a public page or feed. Readers and downstream cite-checkers cannot distinguish "the paper is wrong" from "the system failed to run the paper", and the original authors are smeared by inference. This is the failure mode that destroys research institutions overnight.

2 · What "human replication review" means

For a verdict to be publishable on a public surface, all of the following must be true:

  • A named human reviewer with relevant methodological expertise has personally attempted to replicate the result.
  • The reviewer's identity and qualifications are disclosed on the verdict page.
  • The original authors of the paper / study / claim have been contacted with the reviewer's findings and given at least 30 days to respond before publication.
  • Author response (or non-response, after 30 days) is published alongside the verdict.
  • The verdict distinguishes technical failure (data unavailable, code unavailable, environment broken) from substantive failure (data + code ran, result does not reproduce). These are different findings and the page must label them as such.
  • The verdict is reviewed by at least one second editor for adversarial critique before publication.

3 · Pre-conditions before any verdict UI ships

The current editorial board is in formation (1 of 6 slots filled). Until the following are true, no reproducibility verdict UI will ship at all:

  • Editorial board has ≥3 named editors with disclosed COI (see /wiki/editorial-board).
  • A reviewer-recruitment process for replication-attempt reviewers exists in writing and has been published.
  • A legal review of false-light / defamation exposure is complete for the jurisdictions where verdicts could be challenged.
  • The structural enforcement in §4 below is in place and pinned in the test suite.

4 · Structural enforcement

The codebase is forbidden from including a briefing template named reproducibility_verdict (or any near-synonym) in the auto-publication path. The drift catcher src/__tests__/reproducibility-verdict-no-auto-publish-pin.test.ts enforces this: if a future contributor adds such a template to the briefing-templates module, the test fails and CI blocks merge.

When (if) verdict UI is built, the implementation must: (a) be gated behind an admin-only role, (b) require explicit named-reviewer attribution before save, (c) require a 30-day author-notification window encoded as a state-machine transition before publication, (d) emit withAudit events for every state change, (e) be reviewed against this policy before merge.

5 · What we will publish about reproducibility

The above rules apply to verdicts about specific papers / studies / claims. The following reproducibility activities are not gated by them, because they do not involve public verdicts on third-party work:

  • Catalog-determinism reproducibility. The claim that the same catalog input produces the same article output every render is structural and verifiable (see /wiki/meta). This is what the "Reproducible" column of the Three Rs framework refers to in our context.
  • Coverage Games inter-rater agreement.A quarterly stratified-sample exercise where multiple classifiers independently judge a small set of coverage cells. The findings are published in aggregate (e.g. "75% agreement") and the disagreement matrix is documented in docs/coverage-games-*.md. This measures Policy Window's own editorial replicability, not third-party paper reproducibility.
  • Source-URL liveness CI. A weekly job that checks whether the primary-source URLs cited in catalog rows still resolve. A 404 is reported as a maintenance issue (the source moved or was removed), not a reproducibility verdict on the underlying paper.
  • Methodological commentary. Articles that discuss reproducibility as a field (e.g. the Three Rs framework, Coverage Games protocol, social-science replication crises) are editorial content under the standard catalog rules and do not need the §2 reviewer apparatus.

6 · The /wiki/meta "100% Reproducible" copy

The /wiki/meta dashboard uses the Nature April 2026 Three Rs vocabulary (Reproducible / Replicable / Robust). The headline "Reproducible: 100%" refers to catalog-determinism reproducibility— same catalog input yields same article output every render — not to research-claim reproducibility. The page distinguishes these two senses explicitly to avoid overclaiming the trust signal. If you read it without the distinction, the wording is misleading; we know, and the distinction is now load-bearing on the page itself.