Technical methods that identify which training examples most influenced a specific AI model output, enabling provenance claims about generated content and supporting copyright / consent / accountability disputes downstream.
Definition and scope
Training-data attribution (TDA) is the inverse of training: given an output, recover the training examples that caused it. The technical lineage runs from influence functions (Koh & Liang 2017, 'Understanding Black-box Predictions via Influence Functions,' ICML) through gradient-based methods (Pruthi et al. 2020, TracIn) to recent scalable approximations for foundation models (Grosse et al. 2023, Anthropic, 'Studying Large Language Model Generalization with Influence Functions'; Park et al. 2023 TRAK). Adjacent methods include training-data extraction (Carlini et al. 2021, 'Extracting Training Data from Large Language Models') which surfaces verbatim memorisation rather than influence. Governance relevance is now legally acute. The NYT v. OpenAI complaint (Dec 2023) used training-data extraction to show verbatim NYT articles in GPT-4 outputs; ongoing US copyright suits (Authors Guild v. OpenAI, Getty v. Stability AI, Tremblay v. OpenAI) turn partly on whether attribution methods can demonstrate substantial similarity at training-corpus scale. EU AI Act Art. 53(1)(c) requires GPAI providers to publish a 'sufficiently detailed summary' of training-data content — a disclosure obligation that is the regulatory analogue of attribution. China's GenAI Measures Art. 7 requires legal sourcing of training data. Brazil's PL 2338/2023 includes an explicit author-compensation provision. India's DPDPA does not yet address training-data rights directly, but the 2024 MEITY advisories signal forthcoming guidance. Methodologically, TDA at frontier-model scale remains contested: influence-function approximations require restrictive assumptions (locally-linear loss surface) that don't hold for over-parameterised LLMs, and verbatim-extraction methods undercount the (likely larger) population of paraphrased or compositionally-derived outputs.
Used by these instruments
Related concepts
- AI Supply Chain— The end-to-end pipeline of inputs, intermediate artefacts, and downstream applications by which an A
- Model Card— A standardized disclosure document accompanying an AI model that describes its intended use, trainin
- Data Poisoning— A training-time attack in which an adversary inserts crafted examples into the training corpus or fi
Appears in topic articles
Editorial note
Distinguish TDA (which training examples *caused* this output, by influence) from training-data extraction (which examples are verbatim recoverable from the model). Both are policy-relevant but for different claims: influence supports causal-contribution arguments, extraction supports memorisation arguments.
References
Take this further — sign up free
Save, compare, or get alerts when Training-Data Attribution changes. Policy Window is the analyst workbench layered on top of this wiki — free for researchers, civil society, and verified policymakers.