Hallucination

hallucination · Frontier safety

Concept

Confidently-asserted but factually incorrect output produced by an AI model — including fabricated citations, invented people or events, and confabulated numerical values — that the model cannot reliably distinguish from correct output at generation time.

Definition and scope

Hallucination, in the foundation-model-output sense, was named by Ji et al. (2023, 'Survey of Hallucination in Natural Language Generation') and has become the canonical term for LLM factual error. The phenomenon decomposes into intrinsic hallucination (output contradicts available context) and extrinsic hallucination (output asserts facts that aren't grounded in context). NIST AI RMF GenAI Profile (NIST AI 600-1) names 'Confabulation' as a primary risk category, capturing the same phenomenon under a different label (NIST's choice signals a preference against anthropomorphic framing). Governance relevance touches four surfaces. (a) Liability — when an AI-mediated legal brief contains hallucinated citations (Mata v. Avianca, 2023, S.D.N.Y.), who bears responsibility: the lawyer, the AI provider, or the AI deployer? EU AI Act Art. 13 transparency requirements + Art. 86 right-to-explanation are the closest binding frame. (b) Disclosure — should providers disclose hallucination rates as part of model-card disclosures (EU AIA Art. 53)? Industry practice is partial. (c) Redress — when hallucinated output causes harm (defamation via fabricated facts, financial loss via wrong numbers), redress mechanisms are unclear. EU AIA Art. 85 + OECD Principle 1.5 (accountability) frame the obligation; operationalisation is inconsistent. (d) Sectoral safety — hallucination in healthcare (medical-misinformation), criminal-justice (false-positive risk scores), and education (factual errors as authoritative output) drives most sectoral guidance. NIST AI 600-1 explicitly treats confabulation as a primary risk; UK AISI evaluations include factuality probes; Brazil PL 2338/2023 includes accuracy obligations. Methodologically, hallucination cannot be eliminated by current architectures (Xu et al. 2024, 'Hallucination is Inevitable'). Mitigation is via retrieval-augmented generation, confidence calibration, and post-hoc verification — not architectural fixes.

Used by these instruments

EU AI Act· EU
NIST AI RMF Generative AI Profile· US
Brazil AI Bill (PL 2338/2023)· BR
OECD AI Principles (Recommendation)· OECD

Related concepts

Retrieval-Augmented Generation (RAG)— An AI system pattern in which a model's outputs are conditioned on external content retrieved at inf
Model Card— A standardized disclosure document accompanying an AI model that describes its intended use, trainin
Training-Data Attribution— Technical methods that identify which training examples most influenced a specific AI model output,
Scalable Oversight— The set of techniques for supervising AI systems whose outputs are too complex, too numerous, or too

Appears in topic articles

Editorial note

NIST AI 600-1 prefers 'confabulation' over 'hallucination' to avoid anthropomorphic framing; the two terms are interchangeable in current technical literature but the policy-vocabulary choice signals editorial discipline. Wiki articles should default to 'hallucination' as the more widely-used term, but cite the NIST framing when paralleling AI 600-1.

References

Ji, Z., et al. (2023), 'Survey of Hallucination in Natural Language Generation,' ACM Computing Surveys 55(12): 1-38.

Take this further — sign up free

Save, compare, or get alerts when Hallucination changes. Policy Window is the analyst workbench layered on top of this wiki — free for researchers, civil society, and verified policymakers.

Save this article Get alerts on changes Compare with another article

Definition and scope

Related concepts

Retrieval-Augmented Generation (RAG)— An AI system pattern in which a model's outputs are conditioned on external content retrieved at inf

Model Card— A standardized disclosure document accompanying an AI model that describes its intended use, trainin

Training-Data Attribution— Technical methods that identify which training examples most influenced a specific AI model output,

Scalable Oversight— The set of techniques for supervising AI systems whose outputs are too complex, too numerous, or too

Editorial note