New York RAISE Act: Responsible AI Safety and Education Act

Policy Window Editorial Board

Background & scope

New York RAISE Act: Responsible AI Safety and Education Act addresses 3 contested AI-governance topics explicitly, 2 via general principles.

Provisions & coverage

governs
Foundation Models / GPAIArt. 1420(6)^[1]
implicit
Compute-Threshold ReportingN.Y. Gen. Bus. Law § 1420(6),(9) — the frontier-model / large-developer compute figures SCOPE the regulated class; no standalone compute-figure reporting duty to a regulator. (The Mar. 27, 2026 chapter amendment revised the large-developer threshold to align more closely with California's criteria; the verdict — coverage-scoping, not a reporting duty — is unchanged by the specific figure.)^[1]
governs
Transparency ObligationsArt. 1421(1)(C)^[1]
governs
Catastrophic & Existential RiskArt. 1421(1)^[1]
implicit
Agentic AI GovernanceN.Y. Gen. Bus. Law § 1420(7) critical harm includes model conduct 'with no meaningful human intervention'; § 1420(13) 'safety incident' includes autonomous model behaviour + control failures — autonomy reached via the catastrophic-risk/incident lens, not a dedicated agentic regime^[1]

Enforcement & impact

Cross-jurisdiction comparison

How peer instruments treat the topics New York RAISE Act: Responsible AI Safety and Education Act governs.

Cross-jurisdiction coverage of the 3 topics governed by New York RAISE Act: Responsible AI Safety and Education Act, compared with 52 peer instruments
Topic	EU-AIA-2024	US-EO-14110	US-EO-14179	UK-WHITEPAPER-2023	CN-GENAI-2023	G7-HIROSHIMA	OECD-AI-PRIN	COE-AI-CONV	UN-RES-2024	NIST-AI-RMF	BLETCHLEY-2023	SEOUL-2024	NIST-AI-RMF-GENAI	CA-SB-1047	IN-DPDP-2023	BR-AIBILL-2024	ASEAN-AI-GUIDE-2024	AU-AI-STRATEGY-2024	ANTHROPIC-RSP-2024°	OPENAI-PREPAREDNESS-2023°	DEEPMIND-FSF-2024°	META-FRONTIER-2024°	UK-US-AISI-MOU-2024	WH-VOLUNTARY-2023	SG-MODEL-AI-2024	JP-METI-AI-2024	EU-GDPR-2016	EU-GPAI-COP-2025	OMB-M-24-10	GSA-AI-GUIDE-2024	DOD-RAI-2022	FEDRAMP-AI-2024	DFARS-252-204	CA-SB-53	CA-SB-243	CA-SB-942	EU-PLD-2024	UNESCO-AI-ETHICS-2021	EU-PWD-2024	CN-DEEPSYN-2022	US-TAKEITDOWN-2025	IT-AILAW-2025	JP-AIPROMO-2025	UN-GDC-2024
Foundation Models / GPAI	governs	governs	silent	implicit	governs	governs	implicit	implicit	silent	governs	governs	governs	governs	governs	implicit	governs	implicit	silent	governs	governs	governs	governs	governs	governs	governs	governs	silent	governs	implicit	governs	implicit	implicit	implicit	governs	silent	implicit	silent	silent	silent	silent	silent	silent	implicit	implicit
Transparency Obligations	governs	implicit	silent	implicit	conflicts	governs	governs	governs	implicit	governs	implicit	governs	governs	implicit	implicit	governs	governs	silent	governs	implicit	implicit	governs	implicit	governs	governs	governs	governs	governs	governs	governs	governs	governs	silent	governs	governs	governs	implicit	governs	governs	governs	silent	governs	governs	governs
Catastrophic & Existential Risk	implicit	governs	silent	implicit	silent	governs	silent	silent	implicit	implicit	governs	governs	governs	governs	silent	governs	silent	silent	governs	governs	governs	governs	implicit	implicit	silent	silent	silent	governs	silent	silent	implicit	silent	silent	governs	silent	silent	silent	silent	silent	silent	silent	silent	silent	implicit

°= industry self-imposed voluntary framework. Comparing a voluntary code's "governs" tint with a binding regulation's "governs" tint flattens the legal-force distinction; use the instrument-page banner for the operative status of each.

Does this instrument’s approach work? — the social-science evidence

Aggregated over the 5 topics this instrument governs: whether each harm is empirically real, and whether the peer-reviewed evidence shows governance reduces it. The badge is the epistemic status of the evidence— “thin”/“absent” efficacy evidence is itself a finding (the “second silence”). Each epistemic-status label is Policy Window's editorial assessment of the cited evidence base (a structured classification), not a verdict any single source issues.

Of the 5 governed topics with a social-science evidence review, evidence that governance reduces the harm is established for 0, contested for 0, thin for 0, and absent for 5 — for most, no replicated study yet shows this instrument's approach works (the "second silence").

Agentic AI Governance
- Is the harm real?evidence: thin
  The capability that agentic governance targets — autonomous multi-step action — is real and rapidly, measurably advancing: METR finds the task length AI agents complete at 50% reliability has doubled roughly every seven months for the past six years (about 50 minutes for frontier 2025 models), and the UK AI Security Institute's first Frontier AI Trends Report (Dec 2025, >30 systems) reports models now finish hour-long software tasks >40% of the time versus <5% in late 2023. The distinct realized HARM from agency (as opposed to the underlying model) is, however, thinly documented: on consequential real-world tasks agents still fail the majority — Gemini 2.5 Pro completed only 30.3% of TheAgentCompany's 175 professional tasks (OpenHands scaffold, project leaderboard) — so the agency-specific harm magnitude is early and context-dependent rather than established at scale.
  Sources: Kwa, West, Becker et al. 2025 (METR; arXiv:2503.14499, 'Measuring AI Ability to Complete Long Tasks'); UK AI Security Institute 2025 (Frontier AI Trends Report, Dec 2025); Xu, Song, Zhou et al. 2024 (TheAgentCompany, arXiv:2412.14161); 30.3% figure per TheAgentCompany leaderboard (OpenHands)
- Does governance work?evidence: absent
  There is no impact-evaluation evidence that agent-specific governance reduces agentic harm: the operative regimes — the EU GPAI Code of Practice (published July 2025, voluntary/non-binding), the Seoul Frontier AI Safety Commitments (2024, voluntary), and AISI agent evaluations — are 2024-25 vintage and have never been measured against an outcome. The scholarship itself has not settled the contested unit of regulation: Kolt (2025) argues for governing the agentic relationship via principal-agent and agency-law tools, while Chan, Ezell, Kaufmann et al. (2024) propose agent-specific visibility mechanisms (identifiers, real-time monitoring, activity logging) that remain proposal-stage and unevaluated — meaning the field has design proposals but, as with most frontier-AI rules, the evidence that any of them works is absent rather than merely thin.
  Sources: Kolt 2025 ('Governing AI Agents', 101 Notre Dame L. Rev., forthcoming; arXiv:2501.07913); Chan, Ezell, Kaufmann et al. 2024 ('Visibility into AI Agents', ACM FAccT 2024, pp. 958-973; DOI 10.1145/3630106.3658948); EU AI Office 2025 (GPAI Code of Practice, July 2025); Seoul Frontier AI Safety Commitments 2024
Catastrophic & Existential Risk
- Is the harm real?evidence: contested
  The catastrophic-uplift premise is genuinely contested: the empirical uplift studies that exist find current frontier models add little. RAND's red-team study found no statistically significant difference in the viability of bioweapon-attack plans produced with vs. without LLMs (Mouton, Lucas & Guest 2024), and OpenAI's 100-participant trial found GPT-4 gave at most a mild, non-significant accuracy uplift (mean +0.88 out of 10 for PhD experts, +0.25 for students; Patwardhan et al. 2024). Honest caveat: the harm is forward-looking, not yet observed — expert opinion on the catastrophic tail is sharply split (median AI researcher puts ~5% on extremely-bad/extinction outcomes, mean ~9-16% across differently-framed questions, n=2,778; Grace et al. 2024), and forecasters underestimated how fast risk-relevant capabilities (e.g. virology troubleshooting) actually arrived (Forecasting Research Institute 2025), so the relevant capabilities are a moving target rather than a settled magnitude.
  Sources: Mouton, Lucas & Guest 2024 (RAND RR-A2977-2, Operational Risks of AI in Large-Scale Biological Attacks: Results of a Red-Team Study); Patwardhan et al. 2024 (OpenAI, Building an Early Warning System for LLM-aided Biological Threat Creation); Grace et al. 2024 (Thousands of AI Authors on the Future of AI, arXiv:2401.02843); Forecasting Research Institute 2025 (Forecasting LLM-enabled Biorisk and the Efficacy of Safeguards)
- Does governance work?evidence: absent
  There is essentially no impact evidence that catastrophic-risk governance reduces catastrophic risk, and structurally there cannot yet be: the harm is a low-probability civilisational tail event, so no controlled trial or before/after evaluation of a realised catastrophe is possible. The dominant instruments are recent, voluntary developer frameworks (Anthropic's Responsible Scaling Policy 2023; OpenAI's Preparedness Framework 2023) built on if-then capability thresholds the developers themselves describe as speculative and qualitative rather than validated risk thresholds. The closest evidence is adjacent and indirect: trained-in deceptive behaviours can persist through standard safety training (Hubinger et al. 2024) — a demonstration that current mitigation may be insufficient, not that any governance regime works — and Anthropic's documented loosening of earlier commitments (RSP 2025 dropped the original pledge to define higher-tier ASL evaluations before developing the corresponding models) illustrates that even the strongest voluntary regimes lack external enforcement or measured efficacy.
  Sources: Anthropic 2023 (Responsible Scaling Policy); OpenAI 2023 (Preparedness Framework); Hubinger et al. 2024 (Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training, arXiv:2401.05566); Hendrycks, Mazeika & Woodside 2023 (An Overview of Catastrophic AI Risks, arXiv:2306.12001)
Compute-Threshold Reporting
- Is the harm real?evidence: contested
  Whether training-compute (FLOP) is a defensible proxy for governance-relevant capability is genuinely contested in the literature. The strongest empirical pressure against it is algorithmic efficiency: Ho, Besiroglu, Erdil et al. (2024) estimate the compute needed to reach a fixed language-model performance level has halved roughly every eight months (95% CI ~5-14 months, i.e. ~3x/year), so any static FLOP-to-capability mapping decays quickly; Hooker (2024) argues FLOP measures operations rather than end-performance, since techniques such as fine-tuning, retrieval, chain-of-thought and tool use can add large capability gains without proportional training compute, and Ord (2025) shows inference-time scaling further decouples deployed capability from training compute. Honest caveat: defenders (Heim & Koessler 2024; Pilz, Heim & Brown 2025) note compute remains the most quantifiable, externally verifiable, and ex-ante measurable correlate of frontier capability currently available, while themselves conceding it is an imperfect proxy that should not be used in isolation — the disagreement is about durability and precision, not whether any correlation exists.
  Sources: Ho, Besiroglu, Erdil, Owen, Rahman, Guo, Atkinson, Thompson & Sevilla 2024, Algorithmic progress in language models, NeurIPS 2024 (arXiv:2403.05812; Epoch AI); Hooker 2024, On the Limitations of Compute Thresholds as a Governance Strategy (arXiv:2407.05694); Ord 2025, Inference Scaling Reshapes AI Governance (arXiv:2503.05705); Heim & Koessler 2024, Training Compute Thresholds: Features and Functions in AI Regulation (arXiv:2405.10799); Pilz, Heim & Brown 2025, Increased Compute Efficiency and the Diffusion of AI Capabilities (AAAI 2025; arXiv:2311.15377)
- Does governance work?evidence: absent
  There is no rigorous evidence that compute-threshold reporting reduces harm or achieves its stated aim, because the regimes have not produced an evaluable record. The US 10^26-FLOP reporting obligation (Executive Order 14110, invoking the Defense Production Act) was revoked on 20 January 2025 (by EO 14148) before its recurring binding reporting rule was finalized — the implementing BIS notice of proposed rulemaking (Sept 2024) never took effect, so no durable reporting record materialized; and the EU AI Act's 10^25-FLOP systemic-risk obligations for general-purpose models only became applicable on 2 August 2025 (with transitional periods into 2027), so no outcome evaluation yet exists. Moreover the 10^25 figure is a rebuttable presumption sitting alongside qualitative high-impact criteria (Art. 51(1)(a) and (2), rebuttable under Art. 52(2)), not a validated risk cutoff. The closest analogue is the broader regulatory-disclosure-mandate literature (Fung, Graham & Weil 2007), which documents that transparency policies' effects on outcomes are highly heterogeneous and frequently ineffective or counterproductive absent enforcement and downstream use — implying that the reporting trigger working as intended is an open empirical question, not a documented result.
  Sources: U.S. Executive Order 14110 (2023), Sec. 4.2 (10^26 FLOP, Defense Production Act); revoked by Executive Order 14148 (Jan 20, 2025); EU AI Act, Reg. (EU) 2024/1689, Art. 51 (10^25 FLOP systemic-risk rebuttable presumption; applicable Aug 2, 2025); Fung, Graham & Weil 2007, Full Disclosure: The Perils and Promise of Transparency (Cambridge University Press)
Foundation Models / GPAI
- Is the harm real?evidence: contested
  Whether the foundation-model category maps to a coherent capability/risk tier is genuinely contested. The original case rests on scale-driven 'emergent abilities' that appear unpredictably above a size threshold (Wei et al. 2022; Ganguli et al. 2022 documented capabilities that are smoothly predictable in aggregate loss yet locally surprising), but Schaeffer, Miranda & Koyejo (2023, a NeurIPS Outstanding Paper) showed many 'emergent' jumps are artefacts of discontinuous metrics and dissolve under linear/continuous scoring — implying capability scales more smoothly than a sharp tier would suggest. Honest caveat: this is a live empirical disagreement about measurement, not a settled finding either way, and compute (the regulatory proxy) is an imperfect stand-in for capability or risk regardless of which side is right.
  Sources: Wei et al. 2022 (Emergent Abilities of Large Language Models, TMLR; arXiv:2206.07682); Schaeffer, Miranda & Koyejo 2023 (Are Emergent Abilities of Large Language Models a Mirage?, NeurIPS 2023, Outstanding Paper; arXiv:2304.15004); Ganguli et al. 2022 (Predictability and Surprise in Large Generative Models, ACM FAccT; DOI 10.1145/3531146.3533229)
- Does governance work?evidence: absent
  There is no impact evaluation showing that GPAI/foundation-model governance reduces harm — the rules are too new (EU AI Act GPAI obligations and the 10^25-FLOP systemic-risk presumption only began binding on 2 August 2025) and the central regulatory lever is itself contested: Hooker (2024) argues compute thresholds are a shortsighted proxy because compute does not reliably track capability or risk, and the thresholds already diverge across jurisdictions (EU 10^25 vs. the now-rescinded US EO 14110's 10^26 operations, rescinded 20 January 2025). The mandated mitigation methods also lack validated efficacy: model evaluation and red-teaming face well-documented coverage limits and an 'audit gap' in the survey/position literature (behavioural testing cannot establish the absence of untested failure modes), and adversarial red-teaming repeatedly defeats deployed safeguards — the UK AI Safety Institute reports finding universal jailbreaks for every frontier system it has tested, and a large public agent-injection competition elicited policy violations across all 22 frontier models tested from ~1.8M attacks (Zou et al. 2025). Even compliant evaluation therefore cannot yet certify the safety the rules demand. (Caveat: this is an absence-of-evidence claim — no efficacy study has been done — not evidence the rules are ineffective.)
  Sources: Hooker 2024 (On the Limitations of Compute Thresholds as a Governance Strategy, arXiv:2407.05694); EU AI Act Arts. 51 & 55 (GPAI systemic-risk presumption, 10^25 FLOP; binding 2 Aug 2025); US EO 14110 (10^26-operation reporting threshold, rescinded 20 Jan 2025 by EO 14148); Zou et al. 2025 (Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition / Gray Swan Arena, arXiv:2507.20526 — 22 frontier agents, ~1.8M attacks); UK AI Safety/Security Institute, Frontier AI Trends Report (universal jailbreaks for every system tested); METR, Common Elements of Frontier AI Safety Policies (2024)
Transparency Obligations
- Is the harm real?evidence: contested
  Documentation artifacts (model cards, datasheets) are well-specified as proposals and are genuinely adopted, but the empirical premise that mandated disclosure produces meaningful transparency is contested. Selbst & Barocas (2018) argue inscrutability and non-intuitiveness are distinct problems and that disclosing rules does not resolve the latter, and large-scale audits find documentation is sparsely and unevenly completed: a systematic analysis of 32,111 Hugging Face model cards (Liang et al. 2024) found environmental-impact, limitations and evaluation sections least often filled, and Bhat et al. (2023, 45 practitioners) found a substantial gap between the documentation proposal and actual practice. Honest caveat: the documentation frameworks themselves are real and adopted, so the dispute is about whether disclosure conveys decision-relevant information, not whether the artifacts exist.
  Sources: Selbst & Barocas 2018 (Fordham Law Review 87:1085-1139); Liang et al. 2024 (Nature Machine Intelligence, s42256-024-00857-z, 'Systematic analysis of 32,111 AI model cards'); Bhat et al. 2023 (CHI '23, 'Aspirations and Practice of ML Model Documentation', DOI 10.1145/3544548.3581518); Mitchell et al. 2019 (FAccT, Model Cards for Model Reporting); Gebru et al. 2021 (CACM 64(12):86-92, Datasheets for Datasets)
- Does governance work?evidence: absent
  There is no rigorous impact evaluation showing that AI transparency mandates (model cards, training-data summaries) measurably reduce bias, misuse or accidents — the central regulatory assumption is empirically untested, partly because flagship mandates like EU AI Act Art. 53(1)(d) GPAI training-data summaries are only subject to AI Office enforcement/verification from 2 August 2026 (the obligation itself began 2 August 2025 for new models). The closest analogue, mandated consumer disclosure, shows small and context-dependent effects: Bollinger, Leslie & Sorensen (2011) found mandatory calorie posting cut average calories per transaction by about 6%, while Loewenstein, Sunstein & Golman (2014) review evidence that disclosure effects are frequently diminished or even reversed by limited attention and often change provider rather than recipient behavior. These are analogues, not AI studies; no study demonstrates that AI transparency disclosure achieves its stated downstream safety aims.
  Sources: Bollinger, Leslie & Sorensen 2011 (AEJ: Economic Policy 3(1):91-128); Loewenstein, Sunstein & Golman 2014 (Annual Review of Economics 6:391-419, 'Disclosure: Psychology Changes Everything'); EU AI Act Art. 53(1)(d) GPAI training-data summary (obligation from 2 Aug 2025; AI Office enforcement from 2 Aug 2026)

How to cite this article

Cite this article

8 formats · 1-click copy

@misc{policywindow-ny-raise-act,
  title  = {New York RAISE Act: Responsible AI Safety and Education Act},
  author = {Policy Window},
  year   = {2025},
  howpublished = {N.Y. Gen. Bus. Law art. 44-B, §§ 1420-1425 (Responsible AI Safety and Education Act, S6953-B / A6453-B, signed Dec. 19, 2025; eff. Jan. 1, 2027)},
  url    = {https://policywindow.org/wiki/ny-raise-act},
  note   = {Primary source: https://www.nysenate.gov/legislation/bills/2025/S6953}
}

Verify the year + paste-and-refine. Primary source linked in BibTeX/RIS note.

Permalink downloads.bib .ris .csl.json

Persistent identifier: https://policywindow.org/wiki/ny-raise-act — committed-stable URL with content-versioning via ?asOf= (rollout pending per methodology §7). DOIs via Zenodo are on the roadmap.

[ref-1] N.Y. Gen. Bus. Law art. 44-B, §§ 1420-1425 (Responsible AI Safety and Education Act, S6953-B / A6453-B, signed Dec. 19, 2025; eff. Jan. 1, 2027)

[ref-2] N.Y. Gen. Bus. Law § 1420(6) defines 'frontier model' (>10^26 FLOP, >$100M compute) + § 1421 imposes operative pre-deployment duties on large frontier-model developers

[ref-3] N.Y. Gen. Bus. Law § 1420(6),(9) — the frontier-model / large-developer compute figures SCOPE the regulated class; no standalone compute-figure reporting duty to a regulator. (The Mar. 27, 2026 chapter amendment revised the large-developer threshold to align more closely with California's criteria; the verdict — coverage-scoping, not a reporting duty — is unchanged by the specific figure.)

[ref-4] N.Y. Gen. Bus. Law § 1421(1)(C) — a large developer must conspicuously publish (with appropriate redactions) its written safety and security protocol and transmit a copy to the attorney general

[ref-5] N.Y. Gen. Bus. Law § 1421(1) requires a large developer to implement and conspicuously publish a written safety and security protocol governing the risk of 'critical harm' from its frontier models, and § 1421(4) requires disclosure of safety incidents within 72 hours; § 1420(7) defines critical harm (100+ deaths/serious injuries or $1B damage via CBRN weapons or autonomous model conduct). NOTE: the floor-text § 1421(2) deployment PROHIBITION was struck by the chapter amendment enacted Mar. 27, 2026 (S8828/A9449), which reoriented the Act to a transparency-and-reporting regime; this cell tracks the RETAINED safety-protocol + incident-reporting duties, not a deployment ban.

[ref-6] N.Y. Gen. Bus. Law § 1420(7) critical harm includes model conduct 'with no meaningful human intervention'; § 1420(13) 'safety incident' includes autonomous model behaviour + control failures — autonomy reached via the catastrophic-risk/incident lens, not a dedicated agentic regime

New York RAISE Act: Responsible AI Safety and Education Act

Background & scope

Provisions & coverage

Enforcement & impact

Cross-jurisdiction comparison

See also

Per-audience views

Further reading

References

How to cite this article

Cite this article

New York RAISE Act: Responsible AI Safety and Education Act

Background & scope

Provisions & coverage

Enforcement & impact

Cross-jurisdiction comparison

See also

Per-audience views

Further reading

References

How to cite this article

Cite this article

Background & scope

Provisions & coverage

Enforcement & impact

Cross-jurisdiction comparison

Does this instrument’s approach work? — the social-science evidence

See also

Related instruments

Per-audience views

Further reading

References

How to cite this article

Cite this article

Background & scope

Provisions & coverage

Enforcement & impact

Cross-jurisdiction comparison

Does this instrument’s approach work? — the social-science evidence

See also

Related instruments

Per-audience views

Further reading

References

How to cite this article

Cite this article