Training-Data Attribution

Policy Window Editorial Board

Training-Data Attribution

training-data-attribution · Frontier safety

Concept

Tools

Last verified 2026-06-21

Cite Share PDF

Technical methods that identify which training examples most influenced a specific AI model output, enabling provenance claims about generated content and supporting copyright / consent / accountability disputes downstream.

Definition & scope

Field consensus on this concept:emerging

Training-data attribution (TDA) is the inverse of training: given an output, recover the training examples that caused it. The technical lineage runs from influence functions (Koh & Liang 2017, 'Understanding Black-box Predictions via Influence Functions,' ICML) through gradient-based methods (Pruthi et al. 2020, TracIn) to recent scalable approximations for foundation models (Grosse et al. 2023, Anthropic, 'Studying Large Language Model Generalization with Influence Functions'; Park et al. 2023 TRAK). Adjacent methods include training-data extraction (Carlini et al. 2021, 'Extracting Training Data from Large Language Models') which surfaces verbatim memorisation rather than influence. Governance relevance is now legally acute. The NYT v. OpenAI complaint (Dec 2023) used training-data extraction to show verbatim NYT articles in GPT-4 outputs; ongoing US copyright suits (Authors Guild v. OpenAI, Getty v. Stability AI, Tremblay v. OpenAI) turn partly on whether attribution methods can demonstrate substantial similarity at training-corpus scale. EU AI Act Art. 53(1)(c) requires GPAI providers to publish a 'sufficiently detailed summary' of training-data content — a disclosure obligation that is the regulatory analogue of attribution. China's GenAI Measures Art. 7 requires legal sourcing of training data. Brazil's PL 2338/2023 includes an explicit author-compensation provision. India's DPDPA does not yet address training-data rights directly, but the 2024 MEITY advisories signal forthcoming guidance. Methodologically, TDA at frontier-model scale remains contested: influence-function approximations require restrictive assumptions (locally-linear loss surface) that don't hold for over-parameterised LLMs, and verbatim-extraction methods undercount the (likely larger) population of paraphrased or compositionally-derived outputs.

Precise definition and the influence-versus-extraction distinction

Training-data attribution (TDA) denotes the methods that, given a model output, recover the training examples that most causally shaped it — formally the inverse of training. The central conceptual line is between influence and extraction. Influence-based attribution ¹ estimates how much each datum changed the model's parameters and hence a given prediction, supporting causal-contribution claims. Training-data extraction (Carlini et al. 2021) instead surfaces examples that are verbatim recoverable, supporting memorisation claims. The distinction is governance-load-bearing because foundation models are demonstrably trained on copyrighted material and fair use is not guaranteed (Henderson et al. 2023, https://jmlr.org/papers/v24/23-0569.html): influence underwrites 'this work contributed', extraction underwrites 'this work is reproduced'. Conflating them mis-states what a technical finding can license, since the two answer different legal questions about contribution versus copying — questions copyright regimes already pose divergently across jurisdictions ².

Technical mechanisms and their scaling limits

The technical lineage begins with influence functions (Koh & Liang 2017, ICML), which approximate, via the loss Hessian, how removing or up-weighting a training point would move a prediction (Koh & Liang 2017). Gradient-tracing methods (Pruthi et al. 2020, TracIn) sidestep the Hessian by accumulating the dot product of training and test gradients across checkpoints ³. Recent work targets foundation-model scale: Grosse et al. 2023 ¹ use Kronecker-factored approximations, while Park et al. 2023 (TRAK) randomly project gradients for tractability. The binding limitation is that these approximations assume a locally-linear, near-convex loss surface that holds poorly for over-parameterised LLMs, so estimates can be noisy and order-sensitive — which matters because technical mitigations are urged precisely to keep training within fair use (Henderson et al. 2023, https://jmlr.org/papers/v24/23-0569.html). Extraction methods, conversely, undercount paraphrased or compositionally-derived outputs they cannot detect, and provenance is not recoverable from metadata either: a large-scale licensing audit found omission rates over 70% ⁴.

Regulatory and litigation engagement

TDA's regulatory analogue is disclosure. EU AI Act Art. 53(1)(d) obliges GPAI providers to publish a 'sufficiently detailed summary' of training-data content, alongside the Art. 53(1)(c) duty to adopt a copyright-compliance policy ⁵; China's GenAI Measures Art. 7 requires legally-sourced training data; Brazil's PL 2338/2023 adds an author-compensation provision. The European TDM regime conditions this: Art. 3 CDSM's research exception is read as granting rightsholders no control and as a possible safe harbour for openly-released models ⁶, while the opt-out faces practical obstacles post-LAION — robots.txt, machine-readability, memorisation ⁷. In US litigation (NYT v. OpenAI, Authors Guild v. OpenAI), extraction has shown verbatim copying, yet a large-scale audit of dataset licensing found omission rates over 70% ⁴, undercutting provenance from metadata alone. Comparative mapping of US fair-use, EU rights-oriented, and UN remuneration models frames the divergence ².

Open questions: efficacy, remedy, and the consensus gap

TDA's empirical consensus is emerging rather than settled, and its governance efficacy is thin. The core dispute is whether attribution can demonstrate substantial similarity at corpus scale, given that the locally-linear approximations underpinning scalable influence estimation ¹ sit uneasily with frontier-model loss surfaces; 2025-26 refinements (DAUNCE, LoGra/LogIX) are incremental and leave the contested status intact. Notably, the leading copyright matter — the ~$1.5B Bartz v. Anthropic settlement (Sept 2025) — turned on documentary acquisition evidence, not technical TDA, suggesting attribution remains evidentially marginal in practice. A second strand asks whether disclosure and audit deliver redress: Sag (2024) proposes case-by-case fair-use assessment over blanket verdicts, while audit-market scholarship warns mandates can entrench rather than constrain power ⁸. Whether meaningful contestation for affected authors is achievable ⁹ remains open, especially as the AI data commons rapidly closes to restriction ¹⁰.

Use in governance

How instruments operationalise this concept

Instrument	Jurisdiction	Status
EU AI Act	EU	in force
Interim Measures for Generative AI Service Management	CN	in force
Brazil AI Bill (PL 2338/2023)	BR	proposed

Appears in topic articles

Social-science evidence — the “so-what”

What the peer-reviewed social science shows: whether the harm this concept addresses is empirically real, and whether governance of it works. The badge is the epistemic status of the evidence(not the policy debate) — “thin” or “absent” efficacy evidence is itself a finding (the “second silence”). Each epistemic-status label is Policy Window's editorial assessment of the cited evidence base (a structured classification), not a verdict any single source issues.

Is the harm real?evidence: established
Training-data attribution names a real, coherently-defined technical capability with a substantial methods literature: influence functions trace a prediction back to influential training points (Koh & Liang 2017), TRAK makes attribution computationally tractable for large differentiable models (Park et al. 2023), and the approach has been scaled via EK-FAC to ~50B-parameter LLMs (Grosse et al. 2023, up to 52B) and to 8B-parameter pretraining over 160B tokens (Chang et al. 2024). Caveat: 'attribution' here means estimated counterfactual/causal influence on an output, which is distinct from — and the methods literature shows often misaligned with — verbatim provenance or factual source-of-record (Chang et al. 2024 explicitly report a 'misalignment between factual attribution and causal influence').
Sources: Koh & Liang 2017 (Understanding Black-box Predictions via Influence Functions, ICML); Park et al. 2023 (TRAK: Attributing Model Behavior at Scale, ICML, arXiv:2303.14186); Grosse et al. 2023 (Studying Large Language Model Generalization with Influence Functions, arXiv:2308.03296); Chang et al. 2024 (Scalable Influence and Fact Tracing for LLM Pretraining, arXiv:2410.17413)
Does governance work?evidence: thin
There is no rigorous evidence that TDA reliably supports the provenance or copyright claims it is invoked for, and its accuracy is contested. The TDA literature documents a basic fragility of gradient-based influence estimates in deep, non-convex settings — practical estimates frequently fail to align with leave-one-out/retraining counterfactuals (Basu et al. 2020; Bae et al. 2022), and Grosse et al. 2023 themselves note influence-function limitations (e.g. influences decay to near-zero when key-phrase order is flipped; heuristic gradient methods like TracIn lack a clear counterfactual connection). Critically, Chang et al. 2024 show classical model-agnostic retrieval (BM25) still outperforms causal-influence methods at finding passages that explicitly contain a relevant fact, demonstrating 'a misalignment between factual attribution and causal influence.' No validated governance regime uses TDA as evidence — the leading copyright decision to date (Bartz v. Anthropic 2025, N.D. Cal., Judge Alsup) turned on documentary evidence about pirated-book acquisition and a central library, not on technical attribution of outputs to training data — so evidence that this lever achieves its aim is thin.
Sources: Chang et al. 2024 (Scalable Influence and Fact Tracing for LLM Pretraining, arXiv:2410.17413); Grosse et al. 2023 (Studying Large Language Model Generalization with Influence Functions, arXiv:2308.03296); Basu et al. 2020 (Influence Functions in Deep Learning Are Fragile, arXiv:2006.14651); Bartz v. Anthropic PBC 2025 (N.D. Cal.)

Editorial note

Distinguish TDA (which training examples *caused* this output, by influence) from training-data extraction (which examples are verbatim recoverable from the model). Both are policy-relevant but for different claims: influence supports causal-contribution arguments, extraction supports memorisation arguments. Currency (2026-06-21): Bartz v. Anthropic resolved in a record ~$1.5B settlement (Sept 2025) over pirated-book acquisition, reinforcing (not contradicting) the article's existing point that the leading copyright matter turned on documentary acquisition evidence rather than technical TDA; 2025-26 methods (DAUNCE black-box attribution on proprietary GPT models, LoGra/LogIX) are incremental refinements that leave the contested / governance-efficacy-thin status intact.

References

Sources cited inline in the analysis, numbered in order of appearance.

Grosse, R., et al. (2023), 'Studying Large Language Model Generalization with Influence Functions' (Anthropic) — the canonical articulation of scalable influence-function-based attribution for foundation models. Training-Data Attribution. arXiv:2308.03296 — Grosse, R., et al. (2023), 'Studying Large Language Model Generalization with Influence Functions' (Anthropic) — the canonical articulation of scalable influence-function-based attribution for foundation models. ↩
Kaigeng Li, Hong Wu, Yupeng Dong (2024) Copyright protection during the training stage of generative AI: Industry-oriented U.S. law, rights-oriented EU law, and fair remuneration rights for generative AI training under the UN's international governance regime for AI, Computer Law & Security Review, 55. 10.1016/j.clsr.2024.106056 — Comparatively maps US (industry-oriented fair use), EU (rights-oriented TDM opt-out) and a proposed UN fair-remuneration approach to copyright at the generative-AI training stage. ↩
arXiv:2002.08484 ↩
Longpre, Mahari, et al. (Data Provenance Initiative) (2024) A large-scale audit of dataset licensing and attribution in AI, Nature Machine Intelligence. 10.1038/s42256-024-00878-8 — Audit of 1,800+ AI training datasets finds "licence omission rates of more than 70% and error rates of more than 50%" on popular hosting sites. ↩
Novelli, Casolari, Hacker, Spedicato & Floridi (2024) Generative AI in EU law: Liability, privacy, intellectual property, and cybersecurity, Computer Law & Security Review. 10.1016/j.clsr.2024.106066 — Examines how the EU AI Act, liability regimes, GDPR, copyright and cybersecurity rules apply to generative AI, identifying gaps and proposing targeted regulatory refinements. ↩
Arne Radeisen (2026) Open Foundation Models and TDM Exceptions to Copyright – Building Blocks for an AI Ecosystem, GRUR International. 10.1093/grurint/ikag002 — Argues Art. 3 CDSM Directive's scientific-research TDM exception 'does not grant rightsholders any control' and can be a 'safe harbor' for training openly released foundation models without licensing data. ↩
Stepanka Havlikova (2025) Technical Challenges of Rightsholders' Opt-out From Gen AI Training after Robert Kneschke v. LAION, JIPITEC – Journal of Intellectual Property, Information Tech. source — Examines post-LAION practical obstacles to the EU TDM opt-out (robots.txt, machine-readability, memorisation): 'While the TDM exceptions may seem workable in theory, implementing them in practice presents a variety of practical… ↩
Petros Terzis, Michael Veale, Noëlle Gaumann (2024) Law and the Emerging Political Economy of Algorithmic Audits, Proceedings of the 2024 ACM Conference on Fairness, Accounta. 10.1145/3630106.3658970 — Analyses how AI-audit mandates create a new political economy of auditing, warning that audit markets can entrench rather than constrain power without underlying governance. ↩
Mireia Yurrita, Himanshu Verma, Agathe Balayn, Kars Alfrink, Ujwal Gadiraju, and Alessandro Bozzon (2025) Identifying Algorithmic Decision Subjects' Needs for Meaningful Contestability, Proceedings of the ACM on Human-Computer Interaction (CSCW). 10.1145/3757415 — Empirically elicits what decision subjects need for contestation to be 'meaningful', informing the design of effective remedies and appeal mechanisms for ADM. ↩
Shayne Longpre, Robert Mahari, Ariel Lee, et al. (2024) Consent in Crisis: The Rapid Decline of the AI Data Commons, arXiv (Data Provenance Initiative; presented NeurIPS Dataset. arXiv:2407.14933 — Longitudinal audit of 14,000 web domains finds a 2023-24 surge in AI training restrictions, with '~5%+ of all tokens in C4...fully restricted from use' within a single year. ↩

Cite this article 8 formats · BibTeX, RIS, APA, Chicago, … · 1-click copy

@misc{policywindow-training-data-attribution,
  title  = {Training-Data Attribution},
  author = {Policy Window},
  year   = {n.d.},
  howpublished = {training-data-attribution — safety},
  url    = {https://policywindow.org/wiki/training-data-attribution},
  note   = {Primary source: https://arxiv.org/abs/2308.03296}
}

Verify the year + paste-and-refine. Primary source linked in BibTeX/RIS note.

Permalink downloads.bib .ris .csl.json

Persistent identifier: https://policywindow.org/wiki/training-data-attribution — committed-stable URL with content-versioning via ?asOf= (rollout pending per methodology §7). DOIs via Zenodo are on the roadmap.

Article tools — track changes, suggest an edit

View history — every captured revision of this article · What links here

Source: Edit on GitHub (search for `training-data-attribution`)

[ref-1] Grosse, R., et al. (2023), 'Studying Large Language Model Generalization with Influence Functions' (Anthropic) — the canonical articulation of scalable influence-function-based attribution for foundation models. Training-Data Attribution. arXiv:2308.03296 — Grosse, R., et al. (2023), 'Studying Large Language Model Generalization with Influence Functions' (Anthropic) — the canonical articulation of scalable influence-function-based attribution for foundation models. ↩

[ref-2] Kaigeng Li, Hong Wu, Yupeng Dong (2024) Copyright protection during the training stage of generative AI: Industry-oriented U.S. law, rights-oriented EU law, and fair remuneration rights for generative AI training under the UN's international governance regime for AI, Computer Law & Security Review, 55. 10.1016/j.clsr.2024.106056 — Comparatively maps US (industry-oriented fair use), EU (rights-oriented TDM opt-out) and a proposed UN fair-remuneration approach to copyright at the generative-AI training stage. ↩

[ref-3] arXiv:2002.08484 ↩

[ref-4] Longpre, Mahari, et al. (Data Provenance Initiative) (2024) A large-scale audit of dataset licensing and attribution in AI, Nature Machine Intelligence. 10.1038/s42256-024-00878-8 — Audit of 1,800+ AI training datasets finds "licence omission rates of more than 70% and error rates of more than 50%" on popular hosting sites. ↩

[ref-5] Novelli, Casolari, Hacker, Spedicato & Floridi (2024) Generative AI in EU law: Liability, privacy, intellectual property, and cybersecurity, Computer Law & Security Review. 10.1016/j.clsr.2024.106066 — Examines how the EU AI Act, liability regimes, GDPR, copyright and cybersecurity rules apply to generative AI, identifying gaps and proposing targeted regulatory refinements. ↩

[ref-6] Arne Radeisen (2026) Open Foundation Models and TDM Exceptions to Copyright – Building Blocks for an AI Ecosystem, GRUR International. 10.1093/grurint/ikag002 — Argues Art. 3 CDSM Directive's scientific-research TDM exception 'does not grant rightsholders any control' and can be a 'safe harbor' for training openly released foundation models without licensing data. ↩

[ref-7] Stepanka Havlikova (2025) Technical Challenges of Rightsholders' Opt-out From Gen AI Training after Robert Kneschke v. LAION, JIPITEC – Journal of Intellectual Property, Information Tech. source — Examines post-LAION practical obstacles to the EU TDM opt-out (robots.txt, machine-readability, memorisation): 'While the TDM exceptions may seem workable in theory, implementing them in practice presents a variety of practical… ↩

[ref-8] Petros Terzis, Michael Veale, Noëlle Gaumann (2024) Law and the Emerging Political Economy of Algorithmic Audits, Proceedings of the 2024 ACM Conference on Fairness, Accounta. 10.1145/3630106.3658970 — Analyses how AI-audit mandates create a new political economy of auditing, warning that audit markets can entrench rather than constrain power without underlying governance. ↩

[ref-9] Mireia Yurrita, Himanshu Verma, Agathe Balayn, Kars Alfrink, Ujwal Gadiraju, and Alessandro Bozzon (2025) Identifying Algorithmic Decision Subjects' Needs for Meaningful Contestability, Proceedings of the ACM on Human-Computer Interaction (CSCW). 10.1145/3757415 — Empirically elicits what decision subjects need for contestation to be 'meaningful', informing the design of effective remedies and appeal mechanisms for ADM. ↩

[ref-10] Shayne Longpre, Robert Mahari, Ariel Lee, et al. (2024) Consent in Crisis: The Rapid Decline of the AI Data Commons, arXiv (Data Provenance Initiative; presented NeurIPS Dataset. arXiv:2407.14933 — Longitudinal audit of 14,000 web domains finds a 2023-24 surge in AI training restrictions, with '~5%+ of all tokens in C4...fully restricted from use' within a single year. ↩

Training-Data Attribution

Definition & scope

Precise definition and the influence-versus-extraction distinction

Technical mechanisms and their scaling limits

Regulatory and litigation engagement

Open questions: efficacy, remedy, and the consensus gap

Use in governance

How instruments operationalise this concept

Appears in topic articles

Editorial note

See also

Further reading

References

Training-Data Attribution

Definition & scope

Precise definition and the influence-versus-extraction distinction

Technical mechanisms and their scaling limits

Regulatory and litigation engagement

Open questions: efficacy, remedy, and the consensus gap

Use in governance

How instruments operationalise this concept

Appears in topic articles

Editorial note

See also

Further reading

References