Wiki · Coverage Games

Coverage Games — replicability protocol (in formation)

Current state, honestly named.The Q2 2026 event was a process shake-out — 1 human (the founding editor) + 1 LLM classifier, n=12 cells. This is not yet peer-review- defensible inter-rater methodology. Genuine replicability (≥3 named human classifiers per round on ≥50 cells) is gated on editorial-board recruitment (currently 1 of 6 slots filled). When the agreement rate from Q2 is cited, it should be labelled as "editor + LLM alignment check" rather than as a replicability finding.

Once the editorial board reaches ≥3 named editors, the quarterly events become genuine inter-rater agreement runs: multiple classifiers independently rate a stratified sample of coverage cells, then compare calls. The agreement rate is the operational measurement for the "Replicable" leg of the Three Rs framework on /wiki/methodology §0. Modelled on the Institute for Replication replication-games protocol.

Event history

Q2 2026
2026-05-28
Complete
First formal event. 12-cell stratified sample. 1 human + 1 LLM classifier. 75% inter-rater agreement on type. Process shake-out before editorial board recruitment.
Full record: /wiki/coverage-games/2026-q2
Q3 2026
Scheduled September 2026
Gated by board recruitment
Target ≥50 cells with ≥3 named human classifiers. Gated by editorial-board recruitment (currently 1 of 6 slots filled).
Q4 2026
Scheduled December 2026
Gated by board recruitment
Target ≥100 cells with editorial board fully in formation. First event whose result is expected to be peer-review defensible.

Protocol summary

Sampling: stratified sample of the coverage matrix (mix of high-confidence, medium-confidence, and untagged cells; spans multiple topic kinds and jurisdictions).
Blind classification: each classifier reads the primary source and produces a type + confidence call before seeing the existing catalog state.
Calibration: blind calls are compared against the catalog. Disagreements get a written resolution rationale.
Public record:the full event — participants, COI disclosures, blind calls, resolutions, metrics, limitations — is published as a wiki article (e.g., /wiki/coverage-games/2026-q2).
Follow-ups: the catalog is updated based on resolutions; rubric ambiguities surfaced by disagreements are added to a forthcoming rubric document.

Full protocol document: docs/coverage-games-process.md in the public repository.

Coverage Games — replicability protocol (in formation)

Event history

Q2 2026

2026-05-28

Complete

First formal event. 12-cell stratified sample. 1 human + 1 LLM classifier. 75% inter-rater agreement on type. Process shake-out before editorial board recruitment.

Full record: /wiki/coverage-games/2026-q2

Q3 2026

Scheduled September 2026

Gated by board recruitment

Target ≥50 cells with ≥3 named human classifiers. Gated by editorial-board recruitment (currently 1 of 6 slots filled).

Q4 2026

Scheduled December 2026

Gated by board recruitment

Target ≥100 cells with editorial board fully in formation. First event whose result is expected to be peer-review defensible.

Protocol summary

Sampling: stratified sample of the coverage matrix (mix of high-confidence, medium-confidence, and untagged cells; spans multiple topic kinds and jurisdictions).

Blind classification: each classifier reads the primary source and produces a type + confidence call before seeing the existing catalog state.

Calibration: blind calls are compared against the catalog. Disagreements get a written resolution rationale.

Public record:the full event — participants, COI disclosures, blind calls, resolutions, metrics, limitations — is published as a wiki article (e.g., /wiki/coverage-games/2026-q2).

Follow-ups: the catalog is updated based on resolutions; rubric ambiguities surfaced by disagreements are added to a forthcoming rubric document.

Full protocol document: docs/coverage-games-process.md in the public repository.