Print-friendly view · use your browser's Save as PDF option (Cmd/Ctrl-P) to attach this article to a brief.
AI Alignment
alignment · safety · concept
Source: https://policywindow.org/wiki/alignment
Generated 2026-05-30T22:07:34 UTC
Summary
The technical problem of designing AI systems whose objectives, behaviour, and emergent goals reliably track the values or instructions of their principals across deployment contexts.
At a glance
- Used by
- 7 instrument(s)
- Related concepts
- deceptive-alignment, mesa-optimization, scalable-oversight, capability-elicitation, red-team-evaluation
- Primary source
- Yudkowsky, E. (2008), 'Artificial Intelligence as a Positive and Negative Factor in Global Risk' — the field-foundational articulation of the alignment problem.
- Source URL
- https://intelligence.org/files/AIPosNegFactor.pdf
Details
Alignment, in the technical sense, is distinct from regulatory 'compliance' or 'safety.' It asks: even if a model is capable and even if it is supervised, does it pursue what its principal actually wants — or does it pursue a proxy objective that diverges in edge cases? The problem decomposes into outer alignment (specifying what we want the model to do — see Krakovna et al.'s 'specification gaming' literature) and inner alignment (whether the model trained on that specification actually internalised it — see Hubinger et al. 2019 on mesa-optimisation). Governance instruments rarely use the word 'alignment' directly. EU AIA Art. 51-55 obligations approximate alignment concerns by mandating systemic-risk assessment + adversarial testing + cybersecurity protection, but do not require demonstrated alignment of model objectives. US EO 14110 §4.2(a) mandated reporting on alignment-relevant capabilities (red-team results) without defining 'alignment.' Anthropic, OpenAI, and DeepMind publish their own alignment research agendas; these are de facto cited in policy debates but absent from binding text. The field treats alignment as a research problem first and a governance object only secondarily.
How to cite this article
APA
Policy Window. (n.d.). AI Alignment [Wiki article — Concept]. https://policywindow.org/wiki/alignment
Chicago
Policy Window. n.d.. "AI Alignment." Wiki article (Concept). https://policywindow.org/wiki/alignment.
Harvard
Policy Window (n.d.) 'AI Alignment', Wiki article — Concept, available at: https://policywindow.org/wiki/alignment.
OSCOLA
Policy Window, 'AI Alignment' (Wiki article — Concept, n.d.) <https://policywindow.org/wiki/alignment> accessed [date].
BibTeX
@misc{policywindow-alignment,
title = {AI Alignment},
author = {Policy Window},
year = {n.d.},
howpublished = {alignment — safety},
url = {https://policywindow.org/wiki/alignment},
note = {Primary source: https://intelligence.org/files/AIPosNegFactor.pdf}
}