Print-friendly view · use your browser's Save as PDF option (Cmd/Ctrl-P) to attach this article to a brief.
Prompt Injection
prompt-injection · safety · concept
Source: https://policywindow.org/wiki/prompt-injection
Generated 2026-05-30T22:11:16 UTC
Summary
An adversarial input technique in which untrusted content fed to an AI model (e.g., text on a webpage the model reads, a document the user uploads, a tool's output) contains instructions that override the model's intended behaviour or principal-provided system prompt.
At a glance
- Used by
- 2 instrument(s)
- Related concepts
- agentic-system, tool-use-safety, jailbreak-resistance, data-poisoning, retrieval-augmented-generation
- Primary source
- Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., Fritz, M. (2023), 'Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection.'
- Source URL
- https://arxiv.org/abs/2302.12173
Details
Prompt injection was named by Willison (2022, 'Prompt injection attacks against GPT-3') and formalised by Greshake et al. (2023, 'Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection'). The attack class splits into two sub-cases: (a) direct prompt injection — the user (or attacker posing as user) submits adversarial text in the prompt; mitigated partly by training-time alignment + system-prompt design; (b) indirect prompt injection — the model ingests untrusted content (a webpage during browsing, a PDF the user uploads, the output of a tool call) which contains adversarial instructions; the model cannot reliably distinguish 'data' from 'instructions' because both share the same token-stream interface. Indirect injection is the more serious failure mode at deployment because the attacker doesn't need access to the user's session. NIST AI RMF GenAI Profile (NIST AI 600-1) names prompt injection in the 'Information Security' risk category. EU AI Act Art. 15 ('cybersecurity' requirement for high-risk and Art. 55 for GPAI with systemic risk) is the closest binding obligation — providers must protect against 'attempts by unauthorised third parties to alter the use, behaviour or performance of the system.' Industry mitigations (constitutional classifiers, dual-LLM gateway patterns, content-isolation tags) are evolving rapidly but no architectural defence is yet known to be robust. The OWASP LLM Top 10 (2023, 2025 update) lists prompt injection as LLM01 — the most-cited application-security risk for LLM-integrated software.
How to cite this article
APA
Policy Window. (n.d.). Prompt Injection [Wiki article — Concept]. https://policywindow.org/wiki/prompt-injection
Chicago
Policy Window. n.d.. "Prompt Injection." Wiki article (Concept). https://policywindow.org/wiki/prompt-injection.
Harvard
Policy Window (n.d.) 'Prompt Injection', Wiki article — Concept, available at: https://policywindow.org/wiki/prompt-injection.
OSCOLA
Policy Window, 'Prompt Injection' (Wiki article — Concept, n.d.) <https://policywindow.org/wiki/prompt-injection> accessed [date].
BibTeX
@misc{policywindow-prompt-injection,
title = {Prompt Injection},
author = {Policy Window},
year = {n.d.},
howpublished = {prompt-injection — safety},
url = {https://policywindow.org/wiki/prompt-injection},
note = {Primary source: https://arxiv.org/abs/2302.12173}
}