Skip to main content

Wiki · Coverage Games

Coverage Games — replicability protocol

Quarterly events where multiple classifiers independently rate a stratified sample of coverage cells, then compare calls. The inter-rater agreement rate is the operational measurement for the "Replicable" leg of the Three Rs framework on /wiki/methodology §0. Modelled on the Institute for Replication replication-games protocol.

Event history

  • Q2 2026

    2026-05-28

    Complete

    First formal event. 12-cell stratified sample. 1 human + 1 LLM classifier. 75% inter-rater agreement on type. Process shake-out before editorial board recruitment.

    Full record: /wiki/coverage-games/2026-q2

  • Q3 2026

    Scheduled September 2026

    Gated by board recruitment

    Target ≥50 cells with ≥3 named human classifiers. Gated by editorial-board recruitment (currently 1 of 6 slots filled).

  • Q4 2026

    Scheduled December 2026

    Gated by board recruitment

    Target ≥100 cells with editorial board fully in formation. First event whose result is expected to be peer-review defensible.

Protocol summary

  1. Sampling: stratified sample of the coverage matrix (mix of high-confidence, medium-confidence, and untagged cells; spans multiple topic kinds and jurisdictions).
  2. Blind classification: each classifier reads the primary source and produces a type + confidence call before seeing the existing catalog state.
  3. Calibration: blind calls are compared against the catalog. Disagreements get a written resolution rationale.
  4. Public record:the full event — participants, COI disclosures, blind calls, resolutions, metrics, limitations — is published as a wiki article (e.g., /wiki/coverage-games/2026-q2).
  5. Follow-ups: the catalog is updated based on resolutions; rubric ambiguities surfaced by disagreements are added to a forthcoming rubric document.

Full protocol document: docs/coverage-games-process.md in the public repository.