Post-publication Comment · Critical AI
Comment on “Student perspectives on the use of generative artificial intelligence technologies in higher education”
Critical AI · published 2026-06-27 · v1.0 · CRIT-000019
Concerning: Heather Johnston, Rebecca Wells, Elizabeth M. Shanks, Timothy Boey, Bryony N. Parsons · International Journal for Educational Integrity · 2024-02-08
Why this paper was selected
G88b coverage-moving batch (education white-space); validated G84 engine; span-grounded to the OpenAlex abstract.
AI/AGI centrality 1/5 · societal relevance 3/5 · source-journal note: Off-monitored: International Journal for Educational Integrity is a peer-reviewed education journal not in the monitored determination; disclosed off-list.
Summary
Researchers surveyed 2,555 University of Liverpool students about AI tools like ChatGPT to help update the university's academic-integrity rules. Most students had heard of these tools, supported light-touch aids like Grammarly far more than using ChatGPT to write a whole essay, and wanted a clear university policy. The survey is large and useful for its local purpose. Judged only from the abstract, its limits are about scope rather than over-claiming: it combines 'used' and merely 'considered using' into one headline figure, so that number is not a clean use rate; the link it reports between writing confidence and AI use is descriptive, so we cannot tell what drives it; and its policy recommendations (don't ban, ensure equal access) are reasonable normative positions that go beyond what a descriptive survey can by itself establish. It is one institution's snapshot, not a generalizable study.
Central claims & evidence map
| Claim | Type | Evidence offered | Support | Overclaiming | Main weakness |
|---|---|---|---|---|---|
| Students who had higher levels of confidence in their academic writing were less likely to use or consider using them for academic purposes, and were also less likely to be supportive of other students using them. | Students who had higher levels of confidence in their academic writing were less likely to use or consider using them for academic purposes, and were also less likely to be supportive of other students using them. | Moderate | None | Reported as a descriptive association with no mechanism identified and no adjustment for plausible correlates of writing confidence (discipline, study level, prior attainment); it is informative as one institution's correlation but underdetermined as an explanation -- a limit on interpretive scope, not over-claiming by the authors. | |
| over half had used or considered using these for academic purposes | over half had used or considered using these for academic purposes | Moderate | None | Combines actual use with mere consideration in one headline figure with no reported split, so the number cannot be read as a use rate; 'academic purposes' is also left undefined. The abstract is transparent that the figure is combined, so this is a precision limit, not a distortion. | |
| these technologies should not be banned from university, but consideration must be made to ensure different groups of students have equal access to the technologies | Normative | these technologies should not be banned from university, but consideration must be made to ensure different groups of students have equal access to the technologies | Weak | None | A normative policy recommendation, reasonable and appropriate to a policy-informing study, but value-based and therefore beyond what the descriptive survey results by themselves establish; the fair note is the descriptive/normative boundary, not a demand for an empirical disparities analysis. |
Per-claim assessment
CLAIM-001. Students who had higher levels of confidence in their academic writing were less likely to use or consider using them for academic purposes, and were also less likely to be supportive of other students using them.
This is the abstract's most analytically ambitious finding, a directional association between writing confidence and both usage and attitudes. Read against the abstract, the paper reports it as an association, not a causal claim, so the fair reading is descriptive: with no mechanism identified and no adjustment reported, it cannot be distinguished from confounding by correlates of writing confidence or from a reversed relationship. As one institution's correlation it is informative; it simply does not license an explanation of why confident writers abstain, and the abstract does not assert one. The limit is interpretive scope, not over-claiming.
CLAIM-002. over half had used or considered using these for academic purposes
Collapsing 'used' and 'considered using' into a single 'over half' figure is a construct-precision limit: behavioural use and mere contemplation are different constructs with different policy weight, and no split is reported. The abstract is transparent that the figure is combined, so this is not a distortion of the headline rate; it is a limit on how precisely uptake can be stated from the reported number. A breakdown would let the figure be read as an actual use rate.
CLAIM-003. these technologies should not be banned from university, but consideration must be made to ensure different groups of students have equal access to the technologies
The recommendations are normative policy positions, and the abstract frames the study as policy-informing. As normative claims they are reasonable and appropriate to an applied integrity journal; the only fair observation is that they are value-based recommendations that go beyond what the descriptive survey results by themselves establish. This is a scope note about the descriptive/normative boundary, not a demand that a policy recommendation supply an empirical disparities analysis.
Scorecard
Sub-scores are 0–5 editorial judgements on fixed scales (higher is better, except methodological risk and overclaiming where higher is worse). They are contestable and open to a severity challenge from authors.
Construct and measurement validity
Two construct-precision points stand out, neither a distortion. First, the engagement headline combines behaviour and intention: "over half had used or considered using these for academic purposes" merges actual use with mere contemplation, two constructs with different policy weight, and no split is reported -- so the figure cannot be read as a use rate, though the abstract is transparent that it is a combined number. Second, the key explanatory variable, students' confidence in their academic writing, is presumably self-rated and single-item; the abstract gives no indication of how it was measured or validated. Both are limits on precision and interpretation rather than evidence of over-claiming.
Inference and confounding
The relationship the abstract reports -- that writing-confident students were less likely to use or consider AI and less likely to support peers using it -- is a descriptive association, and is best read as one. The abstract identifies no mechanism and reports no adjustment for plausible correlates of writing confidence (discipline, study level, prior attainment, EAL status); because confidence plausibly co-varies with these, the relationship could be confounded or operate in the reverse direction. This does not mean the finding is overstated -- the abstract presents it as an association, not a cause -- but it does bound what can be inferred: as one institution's correlation it is informative, as an explanation of why confident writers abstain it is underdetermined.
Scope, generalizability, and policy claims
The study is explicitly single-institution (University of Liverpool), which is entirely appropriate for its stated aim but bounds external validity; attitudes at one large UK university need not transfer. Its recommendations -- that "these technologies should not be banned from university, but consideration must be made to ensure different groups of students have equal access to the technologies" -- are normative policy positions. They are reasonable and well-suited to a policy-informing study; the only fair observation is that, as value-based recommendations, they extend beyond what the descriptive survey results by themselves establish. That is a note about the descriptive/normative boundary, not a demand that the recommendation be backed by an empirical disparities analysis the genre does not call for.
What the study does well
The large sample (2555 respondents) gives the descriptive proportions real weight, and the differentiated results (54.1% support for Grammarly-type tools versus 70.4% opposition to whole-essay ChatGPT use) show the instrument captured genuine nuance rather than a single pro/anti axis. The applied framing is honest about its purpose, and demand for 'a university wide policy' (41.1%) is exactly the actionable stakeholder signal an integrity office needs. For the International Journal for Educational Integrity, this is a fit-for-purpose contribution to the student-voice evidence base.
Strongest critique
The fair critique is about scope and construct precision, not over-claiming -- read against an abstract, the paper does not overstate its case. Its most analytically ambitious finding, that "Students who had higher levels of confidence in their academic writing were less likely to use or consider using them for academic purposes, and were also less likely to be supportive of other students using them.", is reported as a descriptive association and should be read as one: the abstract asserts no causal mechanism and leaves the relationship unadjusted, so it cannot establish whether writing confidence itself, or a correlate such as discipline, study level, or prior attainment, tracks usage. As one institution's correlation it is informative; as an explanation of why confident writers abstain it is underdetermined, and the abstract does not claim otherwise. Two scope limits compound this. The headline engagement figure combines two distinct constructs -- "over half had used or considered using these for academic purposes" -- so it mixes behaviour with intention and cannot be read as a use rate; the abstract is transparent that the figure is combined, but the combination still limits how precisely uptake can be stated. And the work is single-institution (University of Liverpool), with a survey instrument built by a student library team with no reported psychometric validation, so external validity and instrument reliability are both unestablished from the abstract. These are calibration-and-scope limits on a competent, fit-for-purpose applied survey, not competence or integrity faults.
Strongest fair defence
This is a descriptive survey study with a clearly stated applied purpose -- to "inform changes to the University of Liverpool Academic Integrity code of practice" -- and on those terms it is appropriately scoped and genuinely useful. A sample of "2555 students" is large for a single-institution attitudinal survey and lends real weight to the descriptive proportions reported. The authors do not claim causality or generalizability beyond their institution; the headline results (awareness levels, differential support for Grammarly versus whole-essay ChatGPT use, demand for a university-wide policy) are exactly the kind of local stakeholder evidence that integrity-policy revision should rest on. The differentiated support figures (54.1% vs 70.4%) demonstrate the survey captured nuanced rather than monolithic attitudes. For a practice-oriented integrity journal, descriptive student-voice data is a legitimate and valuable contribution, not a methodological shortfall.
Conclusion
A competent, large-sample descriptive survey that achieves its stated applied aim of informing one university's academic-integrity policy, and that is honest about being exactly that. Read against the abstract it does not over-claim: its credibility is strongest for the headline proportions, and its limits are matters of scope and construct precision rather than over-reach. The confidence-and-usage relationship is reported as a descriptive association and is best read narrowly, since the abstract identifies no mechanism and reports no adjustment; the combined 'used or considered using' figure mixes behaviour with intention and so is not a clean use rate; and the normative recommendations (no ban, equal access) are reasonable policy positions that extend beyond what a descriptive survey can by itself establish. None of these are integrity or competence concerns; they are calibration and inferential-scope limits typical of practitioner survey work. Severity is capped at moderate given abstract-only access.
Reply from the authors
Following the practice of Nature Matters Arising, Science Technical Comments and PNAS Letters, this Comment is published as one half of a Comment + Reply pair: the authors of the original article are invited to respond, and any reply is published here verbatim alongside the Comment as part of the record.
Reply: not yet invited. No reply has been received for publication.
The authors have a right of reply and no veto. A reply may request a factual correction, a methodological rebuttal, a clarification, a data/code update, or a severity challenge, and is published unedited. See the right-of-reply policy.
References
Every external source this Comment cites, each with a verified link. 0 fabricated.
Source-grounding attestation
- ✓Verbatim source spans present in the critique — 3/3 provenance spans re-derived in the critique prose
- ✓Passes the publication validator — no errors
- ✓Zero fabricated citations — 0 fabricated
- ✓Severity within the access-basis cap — severity "moderate" ≤ cap "moderate" for abstract_only
Every verbatim span the critique relies on is re-derived in the prose in-app; span-in-source is re-verifiable offline (the abstract is re-fetched, not stored, per the no-reproduce policy).
Re-verify span-in-source offline: python3 scripts/verify-queue-critiques.py
Independent faithfulness review
A refute-by-default adversarial panel (two independent reviewers — an overreach lens and a mischaracterization lens — that fetched the real source) tried to prove this critique misread the paper. This is an AI adversarial review recorded with its reasoning, not a deterministic check.
Retrieved the abstract via ERIC (EJ1411141) and Crossref content negotiation on the DOI; both confirm the paper's substance (2,555 students, single-institution University of Liverpool, instrument created/reviewed by a student library team, no-ban + equitable-access recommendation, confidence-negatively-associated-with-usage finding). DataCite 404'd and a verbatim-phrase web search did not surface the source, but the retrieved fetches corroborate all three verified-verbatim quotes (Q1-Q3), which the critique reproduces exactly: claim [1] = Q1 verbatim, claim [2] = Q2 verbatim, claim [3] = Q3 verbatim. No misquotes. The critique invents no findings: every interpretive move is explicitly hedged as an absence-based or plausibility inference ('the abstract identifies no mechanism', 'presumably self-rated and single-item', 'unestablished from the abstract', confounders 'plausibly' co-vary), and it repeatedly and correctly states the paper does NOT claim causation and is 'transparent' that the engagement figure is combined -- so it does not attribute over-claiming the paper never made. It credits strengths generously (competent large-sample descriptive survey, achieves its stated applied aim, honest about its scope, informative as one institution's correlation, recommendations reasonable and genre-appropriate). The normative-recommendation note is well-calibrated, conceding the descriptive/normative boundary point without demanding an empirical disparities analysis the genre does not require. The lone non-quote detail ('vetted through focus groups') is a minor unsupported embellishment, but it is load-light, used only to support an explicitly-hedged instrument-reliability caveat, and the student-team-creation core is corroborated by the ERIC text. Severity appropriately capped at moderate given abstract-only access. This is a conservative, accurate, fairly-hedged review. [Post-review: the 'focus groups' instrument-vetting detail flagged here was removed from the published critique, leaving only the source-corroborated student-library-team creation and the absence of reported psychometric validation.]
Version & correction history
| Version | Date | Change |
|---|---|---|
| v1.0 | 2026-06-27 | Initial publication; strongest critique narrowed to its genre-appropriate core during pre-publication over-reach review. |
No silent substantive corrections — every change is versioned and visible.
How to cite this Comment
Critical AI. Comment on “Student perspectives on the use of generative artificial intelligence technologies in higher education” (Heather Johnston et al., International Journal for Educational Integrity, 2024). Critical AI; 2026. https://policywindow.org/critique/c/student-perspectives-generative-ai-higher-ed
A registered DOI will replace the URL once minted; until then the canonical URL is the persistent identifier. Highwire/Dublin-Core citation tags and a schema.org Review record are embedded in this page for Google Scholar and reference managers.
Verify this Comment. Its checkable facts (target DOI, access-basis severity cap, zero fabricated citations) are served — as the app’s self-report — at /critique/api/critiques/student-perspectives-generative-ai-higher-ed/verify; to confirm them independently of this site, re-derive the same checks (and resolve the target DOI) with npx tsx scripts/verify-critical-ai.ts --critique student-perspectives-generative-ai-higher-ed --live.
Content fingerprint bd73185ce881481a (v1.0) — this Comment’s substantive content is content-addressed; a silent post-publication edit would change it.