Methodology
How the AI Governance Wiki is produced, reviewed, and kept honest. Every assertion below is a structural commitment, not aspirational.
0 · The Three Rs — how this wiki maps onto research-reliability vocabulary
Nature's April 2026 collection on reliable research in the social and behavioural sciences canonises three durability criteria for any research claim:
Reproducible
Same analysis on the same data should produce the same result.
Policy Window: 100% by construction. Articles render deterministically from typed catalog rows — no random sampling, no fitted model, no LLM prose. Download the catalog at /wiki/catalog/json or /wiki/catalog/csv; re-run the render; identical output.
Replicable
An independent classifier reading the same primary sources should reach the same coverage cell.
Policy Window: tested via quarterly “Coverage Games” events (modelled on Institute for Replication's Replication Games). 3-5 independent editors classify a sample of cells from primary sources; the disagreement matrix is published. See /wiki/meta for the latest replicability audit.
Robust
Alternative analytical assumptions should not flip the conclusion.
Policy Window: per-cell confidence tier (high/medium/low) marks cells where a stricter classification rubric would plausibly produce a different label. Surfaced as a halo on the coverage matrix and in the meta dashboard.
The remaining sections of this page describe the operational mechanisms behind each commitment.
1 · Catalog-derived, not AI-generated
All 85 articles render from typed catalog constants in src/lib/international-governance/instruments.ts, src/lib/capability-evals/benchmarks.ts, and src/lib/wiki/concepts.ts. The templates are deterministic — given the catalog row, the article renders exactly the same content every time. There is no LLM-generated prose anywhere in an article page. (The only LLM call in the whole pipeline is the topic proposer, which only suggests new topics for human review — never article content.)
2 · Every claim cites a primary source
Each catalog row carries a sourceUrl (machine-readable canonical URL) and a sourceCitation (human-readable identifier — e.g. Regulation (EU) 2024/1689). Both are surfaced on every article. Coverage-matrix cells (26 instruments × 19 topics) each carry their own per-cell citation. Where a primary source is ambiguous, the article links to the authoritative secondary source instead — labelled as such.
3 · Editorial review (the "Last verified" chip)
Each article header shows a freshness chip: green if reviewed within 90 days, amber within 180, red beyond 180, neutral if review is pending. Editors mark articles as reviewed by updating the catalog row's lastReviewedAt field. The chip is honest about review state — when an article hasn't been reviewed yet, the chip says so, rather than mislabelling catalog-generation timestamp as human verification.
4 · Topic-determination framework
The catalog topics are determined by three reinforcing routes; every topic in the catalog can name which route surfaced it.
- Editorial seed— original topics chosen against the EU AI Act + comparable instruments' primary text. The baseline coverage.
- Audit-driven gap closure — periodic persona audits (research analyst, AI-safety, Global-South, sectoral regulator) surface topics the seed misses. Added with documented rationale.
- Lacuna-driven proposer— an AI scan over recent regulator publications nominates 0–5 candidate new topics per run, gated by the five anti-hallucination checks in §5 below. Every candidate requires editor approval before joining the catalog.
Topic kind taxonomy
Each topic declares a kind so the coverage matrix compares like with like:
- Capability classes — the system / model being regulated (foundation models, biometric ID, deepfakes, agentic systems, catastrophic risk).
- Sectoral applications — the deployment domain (employment, healthcare, criminal justice, education).
- Procedural obligations — cross-cutting duties (transparency, redress, compute reporting, training data, synthetic content provenance, open-weight release).
- Political frames — contested doctrines (sovereign AI, tech sovereignty, development-rights framing).
- Meta-domains — coordination + governance of the governance space itself (international coordination).
Composite Topic Salience Score (CTSS)
Within each kind, topics are ranked by the CTSS combining three signal classes:
- Editorial (30%) — governance density + conflict density across instruments
- External discourse (50%) — regulator activity (EU + US + UK + OECD), academic citation velocity (OpenAlex), search demand, inbound citations
- Influence opportunity (20%) — policy lacunae (topics most instruments are silent on)
Re-derivation cadence
The CTSS measures the salience of existingtopics, so a topic that isn't in the catalog yet scores zero by construction. To catch genuinely-new topics that emerge from the field, the editorial team runs an annual re-derivation pass: re-evaluate the seed against current regulator activity + academic citation patterns + reported topic gaps, and add / merge / split / deprecate / rename topics as warranted. The pass deliverable is a public diff in the catalog plus a short markdown record of the rationale per change. The framework, the process, and the deferred backlog live at docs/topic-redetermination-process.md in the repository.
5 · Anti-hallucination grounding (proposer)
The topic proposer (the only LLM call in the wiki pipeline) runs five reinforcing grounding checks before persisting any candidate:
- Verbatim quote — the model must return a literal substring of a real source entry, verified by exact match.
- Evidence titles — every claimed source title must match a real entry (exact OR Jaro-Winkler ≥ 0.90).
- Description-specifics audit — flags ungrounded years, statute references, percentages, FLOPs.
- Cross-jurisdiction corroboration — evidence must span ≥2 jurisdictions; single-source candidates are rejected.
- Explicit abstention — the prompt explicitly allows returning zero candidates rather than padding to a quota.
Candidates that pass these checks still require an editor to approve them before joining the catalog. The grounding result (warnings, matched jurisdictions, verbatim quote, matched source entries) is persisted with each proposal so the editor can verify provenance independently.
6 · Version history
Catalog changes are captured as ArticleRevision rows — one per content-hash change per article. The full log is public at /wiki/changelog (RSS feed at /wiki/changelog/feed).
Status of ?asOf=<ISO> URL pinning: not yet fully implemented. The parameter is accepted and surfaces a banner so researchers know they've requested a pinned view, but the article body below the banner still reflects the current catalog state — historical snapshots are not yet served. To cite this article in a paper today, use the live URL with the current date. The ArticleRevisiontable is being populated on every catalog change so a future iter can read from it; we document this honestly here rather than have the banner imply pinning that doesn't exist.
7 · Persistent identifier + Editorial Board
Persistent identifier policy: each article's canonical URL on this domain is committed-stable. Slugs do not change after publication; if a topic is renamed, the old slug remains as a redirect (recorded in the changelog). This is the current citation identifier — researchers cite the wiki URL.
DOI policy — roadmap: per-article + per-version DOIs via Zenodo are on the roadmap. The DOI infrastructure isn't live yet — the current commitment is URL stability + public changelog + content-hashed ArticleRevision records. We disclose this honestly, the same way §6 discloses the current ?asOf= implementation status. When the Zenodo integration lands, every existing article + its history get retroactive DOIs; today's URL-based citations remain valid forever.
Editorial Board + named authority: the catalog is reviewed by named editors. The board's structure + conflicts of interest + recruitment process live at /wiki/editorial-board. Articles render from typed catalog constants (no single author wrote them), but the catalog itself is editorially curated and that editorial standard is what gets cited — mirroring the Stanford Encyclopedia of Philosophy precedent.
8 · Corrections workflow
Every article carries a "Report a problem with this page" link in its footer. Clicking opens a pre-filled GitHub issue (anonymous + no account needed for read, GitHub account needed to submit) with the page reference + a structured prompt asking for the specific issue type (stale date, missing source, broken link, wrong jurisdiction, analytic claim that doesn't match the catalog).
Each reported correction is reviewed by an editor; resolved corrections become commits to the catalog (and thus appear in the changelog). The catalog is the single source of truth — once a correction lands, every downstream surface (article page, OG image, CTSS score) updates on the next render.
9 · Update cadence
The signal-based components of the framework refresh on a schedule:
- External discourse signals — daily refresh via the admin endpoint (cron in production); academic velocity weekly.
- Topic proposer — monthly or on demand by editors.
- Article content — updated when primary sources change (a new regulation passes, an instrument is amended). The changelog shows when.
- Editorial review — articles re-verified on a rolling 90-day target. The freshness chip surfaces the current state on each article.