Methodology

How the AI Governance Wiki is produced, reviewed, and kept honest.

Every assertion below is a structural commitment, not aspirational. Each section names a checkable mechanism.

0 · The Three Rs — how this wiki maps onto research-reliability vocabulary

Nature's April 2026 collection on reliable research in the social and behavioural sciences canonises three durability criteria for any research claim:

Reproducible

Same analysis on the same data should produce the same result.

Policy Window: render-deterministic by construction. Articles render deterministically from typed catalog rows — no random sampling, no fitted model; the same catalog input always yields the same article output. (Prose may be AI-assisted under charter §7.9 — named-editor reviewed, provenance-labelled, every claim primary-source cited.) (Cell-level research reproducibility — whether another team would derive the same coverage cell from the same primary sources — is the Replicable axis, measured by Coverage Games.) Download the catalog at /wiki/catalog/json or /wiki/catalog/csv; re-run the render; identical output.

Replicable

An independent classifier reading the same primary sources should reach the same coverage cell.

Policy Window: tested via quarterly “Coverage Games” events (modelled on Institute for Replication's Replication Games). 3-5 independent editors classify a sample of cells from primary sources; the disagreement matrix is published. See /wiki/meta for the latest replicability audit.

Robust

Alternative analytical assumptions should not flip the conclusion.

Policy Window: per-cell confidence tier (high/medium/low) marks cells where a stricter classification rubric would plausibly produce a different label. Surfaced as a halo on the coverage matrix and in the meta dashboard.

The remaining sections of this page describe the operational mechanisms behind each commitment.

1 · Catalog-derived, with disclosed AI involvement under charter §7.9–§7.11

All 110 articles render from typed catalog constants in src/lib/international-governance/instruments.ts, src/lib/capability-evals/benchmarks.ts, and src/lib/wiki/concepts.ts. The templates are deterministic — given the catalog row, the article renders exactly the same content every time. Article prose may be AI-assisted under charter §7.9 — reviewed and approved by a named editor and labelled with its drafting provenance (the ✦ label / data-drafting-mode) — or, for the abstract / explainer prose tiers on article templates, AI-authored, AI-reviewed, and published without human review under charter §7.10 (conspicuously labelled “no human review”, citation-gated, and kill-switch-reversible). Every factual claim still cites a primary source either way. Catalog data and coverage classifications otherwise remain human-gated, with one disclosed exception: under charter §7.11 a small, conspicuously-labelled set of AI-curated instruments may be added — and their coverage classified — by AI without prior human review, at reduced confidence and kill-switch-reversible. Charter §7.12 (2026-06-16) further makes the AGI Social Scientist the primary catalog author — FIRST EXERCISED 2026-06-16 with CA-SB-243 (Companion Chatbots), the first §7.12 AI-authored published row. Until the engine expands and the human authoring paths retire (delete-last), the rest of the catalog remains human-gated except the §7.11/§7.12 AI-authored rows. (The topic proposer remains a separate LLM call that only suggests new topics for human review — never article content.)

2 · Every claim cites a primary source at the provision level where available

Each catalog row carries a sourceUrl (machine-readable canonical URL) and a sourceCitation (human-readable identifier — e.g. Regulation (EU) 2024/1689). Both are surfaced on every article. Coverage-matrix cells (45 instruments × 23topics) each carry their own per-cell citation that names the relevant provision (e.g. "Art. 5(1)(h)") where available; for rows where the citation chain still resolves to the regulation as a whole, finer pinpoints are an editorial backfill commitment. Where a primary source is ambiguous, the article links to the authoritative secondary source instead — labelled as such.

3 · Editorial review (the "Last verified" chip)

Each article header shows a freshness chip: green if reviewed within 90 days, amber within 180, red beyond 180, neutral if review is pending. Editors mark articles as reviewed by updating the catalog row's lastReviewedAt field. The chip is honest about review state — when an article hasn't been reviewed yet, the chip says so, rather than mislabelling catalog-generation timestamp as human verification.

4 · Topic-determination framework

The catalog topics are determined by three reinforcing routes; every topic in the catalog can name which route surfaced it.

Editorial seed— original topics chosen against the EU AI Act + comparable instruments' primary text. The baseline coverage.
Audit-driven gap closure — periodic persona audits (research analyst, AI-safety, Global-South, sectoral regulator) surface topics the seed misses. Added with documented rationale.
Lacuna-driven proposer— an AI scan over recent regulator publications nominates 0–5 candidate new topics per run, gated by the five anti-hallucination checks in §5 below. Every candidate requires editor approval before joining the catalog.

Topic kind taxonomy

Each topic declares a kind so the coverage matrix compares like with like:

Capability classes — the system / model being regulated (foundation models, biometric ID, deepfakes, agentic systems, catastrophic risk).
Sectoral applications — the deployment domain (employment, healthcare, criminal justice, education).
Procedural obligations — cross-cutting duties (transparency, redress, compute reporting, training data, synthetic content provenance, open-weight release).
Political frames — contested doctrines (sovereign AI, tech sovereignty, development-rights framing).
Meta-domains — coordination + governance of the governance space itself (international coordination).

Composite Topic Salience Score (CTSS)

Within each kind, topics are ranked by the CTSS combining three signal classes:

Editorial (30%) — governance density + conflict density across instruments
External discourse (50%) — regulator activity (EU + US + UK + OECD), academic citation velocity (OpenAlex), search demand, inbound citations
Influence opportunity (20%) — policy lacunae (topics most instruments are silent on)

Re-derivation cadence

The CTSS measures the salience of existingtopics, so a topic that isn't in the catalog yet scores zero by construction. To catch genuinely-new topics that emerge from the field, the editorial team runs an annual re-derivation pass: re-evaluate the seed against current regulator activity + academic citation patterns + reported topic gaps, and add / merge / split / deprecate / rename topics as warranted. The pass deliverable is a public diff in the catalog plus a short markdown record of the rationale per change. The framework, the process, and the deferred backlog live at docs/topic-redetermination-process.md in the repository.

5 · Anti-hallucination grounding (proposer)

The topic proposer (the only LLM call in the wiki pipeline) runs five reinforcing grounding checks before persisting any candidate:

Verbatim quote — the model must return a literal substring of a real source entry, verified by exact match.
Evidence titles — every claimed source title must match a real entry (exact OR Jaro-Winkler ≥ 0.90).
Description-specifics audit — flags ungrounded years, statute references, percentages, FLOPs.
Cross-jurisdiction corroboration — evidence must span ≥2 jurisdictions; single-source candidates are rejected.
Explicit abstention — the prompt explicitly allows returning zero candidates rather than padding to a quota.

Candidates that pass these checks still require an editor to approve them before joining the catalog. The grounding result (warnings, matched jurisdictions, verbatim quote, matched source entries) is persisted with each proposal so the editor can verify provenance independently.

6 · Version history

Catalog changes are captured as ArticleRevision rows — one per content-hash change per article. The full log is public at /wiki/changelog (RSS feed at /wiki/changelog/feed).

Status of ?asOf=<ISO> URL pinning: live as of iter-310 followup. When the parameter is present and matches YYYY-MM-DD, the read path queries ArticleRevision for the most-recent snapshot captured at-or-before the requested date. If a snapshot exists, the article body is deserialised from the persisted serializedRowJSON column — readers see the historical catalog state, not the live one, and a green "Pinned snapshot" banner confirms this above the title.

When no snapshot exists for the requested date (e.g. the article post-dates that point in time, or no revision was captured before then), the body falls back to the current catalog state and an amber "No snapshot available" banner discloses the fallback. This honest disclosure pattern means a researcher who cites ?asOf=2026-01-15 always sees either the verified historical content or the warning that one was unavailable — never silent drift.

7 · Decision-support header (Question / Answer / Confidence)

The strategic-assessment §7 requires every article to surface three editorial fields above the standard catalog body: the policy question the article answers, the current editorial answer with explicit uncertainty, and the confidence tier behind that answer. Iter-311 added the three optional fields (policyQuestion, currentAnswer, answerConfidence) to the GovernanceTopic and GovernanceInstrument interfaces. When any of the three is populated the article renders a Decision-support header block above Section A; when all three are absent the block is suppressed (preserving the pre-iter-311 article surface).

Why the suppression-on-absence default? Backfilling 100+ articles in one pass would inflate the editorial burden; staggering the rollout lets each upgrade go through proper editorial review. An article without a Q/A/C header is not "incomplete" in the strong sense — the catalog body still cites every claim to its primary source. The header upgrades a catalog card to a decision-support evidence card, which is a higher editorial bar that not every article needs immediately.

Confidence vocabulary. The answerConfidence field uses the same high | medium | low tier as the per-cell CoverageCell.confidence chip on topic-page coverage tables. When unset (the default) the header renders as "Editorial review pending" rather than implying a confidence level the editor has not assigned. See /wiki/reproducibility-policy for the policy governing how confidence is calibrated and when downgrades / upgrades happen.

8 · Persistent identifier + Editorial Board

Persistent identifier policy: each article's canonical URL on this domain is committed-stable. Slugs do not change after publication; if a topic is renamed, the old slug remains as a redirect (recorded in the changelog). This is the current citation identifier — researchers cite the wiki URL.

DOI policy — roadmap: per-article + per-version DOIs via Zenodo are on the roadmap. The DOI infrastructure isn't live yet — the current commitment is URL stability + public changelog + content-hashed ArticleRevision records (which now also power §6 ?asOf= pinning). When the Zenodo integration lands, every existing article + its history get retroactive DOIs; today's URL-based citations remain valid forever.

Editorial review + named authority: catalog editorial decisions are currently made by the founding editor (Ryan Wong); the editorial board is in formation (1 of 6 subject-editor slots filled as of 2026-05-31). The board's target structure, conflicts of interest, and recruitment status live at /wiki/editorial-boardArticles render from typed catalog constants (no single author wrote them); the catalog itself is editorially curated by the founding editor today, with the named subject-editor bench (≥3 by Q4 2026; ≥6 by 2027) under active recruitment. The editorial standard — including its current scope and known limitations — is what gets cited; see /wiki/editorial-board for the recruitment status board.

9 · Corrections workflow

Every article carries a "Report a problem with this page" link in its footer. Clicking opens a pre-filled GitHub issue (anonymous + no account needed for read, GitHub account needed to submit) with the page reference + a structured prompt asking for the specific issue type (stale date, missing source, broken link, wrong jurisdiction, analytic claim that doesn't match the catalog).

Each reported correction is reviewed by an editor; resolved corrections become commits to the catalog (and thus appear in the changelog). The catalog is the single source of truth — once a correction lands, every downstream surface (article page, OG image, CTSS score) updates on the next render.

10 · Update cadence

The signal-based components of the framework refresh on a schedule:

External discourse signals — daily refresh via the admin endpoint (cron in production); academic velocity weekly.
Topic proposer — monthly or on demand by editors.
Article content — updated when primary sources change (a new regulation passes, an instrument is amended). The changelog shows when.
Editorial review — articles re-verified on a rolling 90-day target. The freshness chip surfaces the current state on each article.

11 · Limits & blind spots

The catalog structure carries the politics of its own framing. We name the structural limits here so readers can pair Policy Window with the kinds of evidence its architecture excludes by design — not because the excluded evidence matters less, but because the catalog format cannot represent it well.

Instrument-centric ontology

The primary unit is the "policy instrument" (Hood's NATO typology). Refusal politics (a community decision to reject an AI system entirely), labour strikes against algorithmic management, and abolitionist demands sit outside the instrument frame and are not catalogued. For those traditions readers should consult civil-society reporting directly (Algorithmic Justice League, Coding Rights, IFF India, Paradigm Initiative).

Coverage-depth asymmetry by jurisdiction

Western instruments (EU AIA, US EO-14110, UK, OECD) carry substantially more coverage cells than Global South instruments (India DPDP, Brazil PL 2338, AU AI Strategy, ASEAN AI Guide) — the depth asymmetry is on the order of 40%. The shorter rows are not a judgement of relevance; they reflect editorial-board capacity. Comparative briefings rooted in Global South governance should use PW as a starting reference, not the authoritative source.

Frame favours regulator anxiety over harm narratives

Topic selection currently centres what regulators have authority over (foundation models, biometric ID, compute reporting). Welfare-system automation, child-protective- services ML, immigration-enforcement AI, gig-worker algorithmic management, and predictive-policing-as- anti-Black-violence are explicitly under-catalogued and named as known gaps in the /wiki/for-advocates page's "Known gaps" section.

Three Rs framework excludes situated knowledge

The Reproducible / Replicable / Robust framing imports biomedical-reproducibility assumptions: only quantifiable, comparable, generalisable evidence is catalogued in the coverage matrix. Lived-experience testimony from affected communities, oral histories of algorithmic harm, and refusal narratives do not fit the cell-grid format and are therefore absent. Readers building a holistic harm picture should pair PW with ethnographic + journalistic accounts.

Editorial board geographic concentration

1 of 6 editorial slots is filled, by the founding editor who has no disclosed Global South AI-policy expertise. Until at least one editor with regional expertise from South Asia, Latin America, Africa, or Southeast Asia joins, regional calibration of Global South coverage rests on a single Western-trained reviewer. Recruitment is open; see /wiki/editorial-board.

English-only

Articles render only in English. Translations into Portuguese, Spanish, Hindi, French, or Arabic would materially improve accessibility for Global South policymakers + practitioners but require funded translation partnerships (not machine translation, which is unsafe for policy-precision documents). No roadmap commitment yet.

"Neutral" framing has politics

The charter §7 commits to not authoring advocacy content or signing open letters. This commitment was made to prevent capture, but it also forecloses the option of being a partisan resource for communities in struggle against a particular governance regime. Catalog rows favour questions about how to govern AI; they do not equally serve questions about whether a system should be deployed at all. Readers approaching PW from an abolitionist or refusal-politics frame should recognise this structural orientation. As a partial step, the whether-to-govern register now documents prohibition, moratorium, ban-campaign and abolition stances (with proponents and sources), reported neutrally and not endorsed — consistent with the §7 no-advocacy commitment.

These limits are not a one-time disclosure. The editorial team revisits them whenever board composition changes or new coverage areas open. If a limit listed here is no longer accurate, the changelog will record the correction.

← Wiki