Wiki · Literature & evidence base
Literature & evidence base
The catalog records what regulators say; this is the evidence base beside it — academic and grey-literature sources linked to the contested topics they bear on. Each entry is catalogued metadata with a link to the source and a one-line editor finding; there are no LLM-written summaries (charter §7).
Peer-reviewed (1)
- Policy Instrument
Lascoumes, P. & Le Galès, P. (2007). Introduction: Understanding Public Policy through Its Instruments — From the Nature of Instruments to the Sociology of Public Policy Instrumentation. Governance 20(1): 1-21. See also Hood (1983) The Tools of Government, ch. 1-2; Salamon (2002) The Tools of Government: A Guide to the New Governance, pp. 1-47; Howlett (2011) Designing Public Policies, ch. 3-5.
Preprint (20)
- Model Card
Mitchell et al. (2019), 'Model Cards for Model Reporting,' FAccT '19
- Deceptive Alignment
Hubinger, E., et al. (2019), 'Risks from Learned Optimization in Advanced Machine Learning Systems.'
- Mesa-Optimization
Hubinger, E., et al. (2019), 'Risks from Learned Optimization in Advanced Machine Learning Systems.'
- Scalable Oversight
Christiano, P., Shlegeris, B., Amodei, D. (2018), 'Supervising Strong Learners by Amplifying Weak Experts.'
- Capability Elicitation
Qi, X., Zeng, Y., Xie, T., Chen, P.-Y., Jia, R., Mittal, P., Henderson, P. (2023), 'Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!'
- Dual-Use Research Norms (DURC for AI)
Solaiman, I., et al. (2019), 'Release Strategies and the Social Impacts of Language Models' — the canonical articulation of structured-access norms for foundation models.
- Training-Data Attribution
Grosse, R., et al. (2023), 'Studying Large Language Model Generalization with Influence Functions' (Anthropic) — the canonical articulation of scalable influence-function-based attribution for foundation models.
- Prompt Injection
Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., Fritz, M. (2023), 'Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection.'
- Agentic AI System
Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., Cao, Y. (2022), 'ReAct: Synergizing Reasoning and Acting in Language Models.'
- Tool-Use Safety
Wallace, E., et al. (2024), 'The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions' (OpenAI) — the canonical industry articulation of instruction-channel hierarchy as a tool-use-safety defence.
- Multi-Turn Evaluation
Zheng, L., et al. (2023), 'Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena' — operationalises the multi-turn evaluation protocol for foundation models.
- Data Poisoning
Carlini, N., et al. (2024), 'Poisoning Web-Scale Training Datasets is Practical' — establishes practical feasibility of poisoning frontier-model training corpora.
- Model Distillation Risk
Hinton, G., Vinyals, O., Dean, J. (2015), 'Distilling the Knowledge in a Neural Network' — the foundational distillation paper; the governance-relevant adaptation runs through Alpaca/Vicuna (2023) and DeepSeek-R1 (2025).
- Jailbreak Resistance
Zou, A., Wang, Z., Kolter, J. Z., Fredrikson, M. (2023), 'Universal and Transferable Adversarial Attacks on Aligned Language Models' — the canonical demonstration that gradient-based suffix attacks transfer across aligned LLMs.
- Model-Merging Risk
Bhardwaj, R., et al. (2024), 'Language Models are Homer Simpson! Safety Re-Alignment of Fine-tuned Language Models through Task Arithmetic' — canonical demonstration that safety training is not preserved under task arithmetic / merging.
- Inference-Time Compute
Snell, C., Lee, J., Xu, K., Kumar, A. (2024), 'Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters' — establishes inference-time-compute scaling as a first-class capability lever.
- Sandbagging
van der Weij, T., Hofstätter, F., Jaffe, O., Brown, S., Ward, F. (2024), 'AI Sandbagging: Language Models can Strategically Underperform on Evaluations.'
- Hallucination
Ji, Z., et al. (2023), 'Survey of Hallucination in Natural Language Generation,' ACM Computing Surveys 55(12): 1-38.
- In-Context Learning
Brown, T., et al. (2020), 'Language Models are Few-Shot Learners' (GPT-3 paper) — the canonical articulation of in-context learning as an emergent capability.
- Retrieval-Augmented Generation (RAG)
Lewis, P., et al. (2020), 'Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,' NeurIPS — the canonical articulation of RAG.
Working paper (2)
- National Bureau of Economic Research | NBER
✦ AIUS National Bureau of Economic Research.
- Featured Working Papers Archive | NBER
✦ AINBER featured economics working papers (incl. AI & labor).
Research institute (6)
- AI Index | Stanford HAI
✦ AIStanford HAI's annual data report on the state of AI.
- Regulation, Policy, Governance | Stanford HAI
✦ AIStanford HAI's regulation & governance research hub.
- Papers & Reports | Epoch AI
✦ AIEpoch AI research on compute, scaling trends & frontier models.
- Artificial Intelligence
✦ AIUS National Academies' AI consensus-study hub.
- Capturing the Potential of Generative AI’s Use in Health and Medicine Requires Collaboration and Oversight, Consideration of Risks, Says NAM Special Publication
✦ AINAM special publication on generative AI in health & medicine.
- One Hundred Year Study on Artificial Intelligence (AI100)
✦ AIStanford's standing century-long study of AI's societal impact.
Think tank (1)
- Anthropomorphic AI terms create gaps in accountability | Brookings
✦ AICommentary on how anthropomorphic AI language obscures accountability.
Civil society (2)
- Measuring up | Ada Lovelace Institute
✦ AIAda Lovelace Institute policy briefing.
- Publications - AlgorithmWatch
✦ AIReports on automated decision-making and its societal impact.
Standards body (3)
- AI Risk Management Framework | NIST
✦ AIUS voluntary AI risk-management framework (Govern/Map/Measure/Manage).
- ISO/IEC JTC 1/SC 42 - Artificial intelligence
✦ AIInternational committee developing AI standards.
- ISO - Security, safety and risk
✦ AIISO security, safety & risk standards portal.
Incident database (1)
- OECD AI Incidents Monitor, an evidence base for trustworthy AI - OECD.AI
✦ AIOECD tracker of real-world AI incidents and hazards.
Evidence types mirror the catalog's ingestion taxonomy (peer-reviewed, preprint, working paper, think-tank, civil-society, standards, official grey). Inclusion is not endorsement — listing a source records that it bears on a topic, not that the catalog agrees with it; contested evidence is expected and is what the topic articles weigh.