The capacity of a foundation model to adapt its behaviour to a new task purely from examples provided in the prompt, without any updates to the model's weights — discovered as an emergent property of large language models and now a primary evaluation surface.
Definition and scope
In-context learning (ICL) was named by Brown et al. (2020, 'Language Models are Few-Shot Learners,' the GPT-3 paper) as the surprising observation that sufficiently large language models could perform new tasks from a few demonstrations in the prompt. The phenomenon is empirically robust across scales above ~1B parameters; theoretical accounts (Xie et al. 2022, 'An Explanation of In-context Learning as Implicit Bayesian Inference'; Garg et al. 2022; von Oswald et al. 2023, 'Transformers Learn In-Context by Gradient Descent') propose various mechanisms but no consensus mechanism has emerged. Governance relevance is methodological. (a) Capability evaluations that test only baseline prompting under-state real-world capability, because deployment prompts routinely include task examples (Wei et al. 2022 chain-of-thought; Anil et al. 2024 many-shot). EU AI Act Art. 55(1)(a) adversarial testing must include ICL-mode probing to be capability-accurate. (b) Safety evaluations that test only baseline refusals under-state real-world failure surface, because many-shot jailbreaking exploits ICL to recover prohibited capabilities (Anil et al. 2024). (c) Model-card disclosures should specify which capabilities are baseline vs ICL-elicited (EU AIA Art. 53 transparency obligation). (d) ICL also affects the open-vs-closed debate: a closed model accessed via API still exposes ICL-elicitation surface, weakening the capability-containment assumption.
Used by these instruments
Related concepts
- Capability Elicitation— Techniques designed to reveal the upper bounds of an AI model's capabilities, rather than measuring
- Multi-Turn Evaluation— An evaluation methodology that probes AI models across multi-step conversations rather than single p
- Jailbreak Resistance— The robustness of an AI model's safety training against adversarial prompts crafted to elicit policy
- Agentic AI System— An AI system that takes actions in the world — calling tools, executing code, browsing the web, send
- Inference-Time Compute— The scaling regime in which model capability is increased by spending more compute at inference time
Appears in topic articles
Editorial note
Distinguish ICL (in-prompt example-based adaptation) from fine-tuning (weight-update-based adaptation) and from retrieval-augmented generation (retrieved-context-based adaptation). All three affect deployed capability without modifying the underlying model, but at different latencies + with different governance surfaces.
References
Take this further — sign up free
Save, compare, or get alerts when In-Context Learning changes. Policy Window is the analyst workbench layered on top of this wiki — free for researchers, civil society, and verified policymakers.