Post-publication Comment · Critical AI
Comment on “Is it harmful or helpful? Examining the causes and consequences of generative AI usage among university students”
Critical AI · published 2026-06-27 · v1.0 · CRIT-000018
Concerning: Muhammad Abbas, Farooq Ahmed Jam, Tariq Iqbal Khan · International Journal of Educational Technology in Higher Education · 2024-02-16
Why this paper was selected
G88b coverage-moving batch (education white-space); validated G84 engine; span-grounded to the OpenAlex abstract.
AI/AGI centrality 2/5 · societal relevance 4/5 · source-journal note: Off-monitored: International Journal of Educational Technology in Higher Education is a peer-reviewed education journal not in the monitored determination; disclosed off-list.
Summary
Researchers ran two surveys of university students to study why they use ChatGPT and what happens when they do. They first built and tested an eight-item questionnaire to measure ChatGPT use (165 students), then ran a larger study (494 students) at three time points. They report that heavier academic workload and time pressure predicted more ChatGPT use, that more ChatGPT use was linked to more procrastination, more reported memory problems, and lower grades, and (surprisingly) that students more sensitive to rewards used it less. The study design is better than a one-time survey because it spaces measurements over time, but it is still based largely on students' self-reports and cannot prove that ChatGPT causes these problems — students who already procrastinate or struggle may simply use it more. The most cautious reading is that the study shows associations and a useful new measurement tool, not proof of harm.
Central claims & evidence map
| Claim | Type | Evidence offered | Support | Overclaiming | Main weakness |
|---|---|---|---|---|---|
| use of ChatGPT was likely to develop tendencies for procrastination and memory loss and dampen the students’ academic performance | use of ChatGPT was likely to develop tendencies for procrastination and memory loss and dampen the students’ academic performance | Weak | Moderate | Causal/quasi-causal language ('develop', 'dampen') rests on time-lagged self-report correlations that cannot exclude confounding or selection; 'memory loss' is an unusually strong, likely self-perceived outcome. | |
| students who were sensitive to rewards were less likely to use ChatGPT | students who were sensitive to rewards were less likely to use ChatGPT | Weak | Minor | A counterintuitive directional effect from an undefined trait construct, reported without indication of robustness checks or correction across multiple simultaneous antecedents. | |
| developed and validated an eight-item scale to measure ChatGPT usage | developed and validated an eight-item scale to measure ChatGPT usage | Moderate | Minor | Self-report usage scale predicting self-report outcomes invites common-method bias; 'usage' construct may bundle heterogeneous behaviors under one measure. |
Per-claim assessment
CLAIM-001. use of ChatGPT was likely to develop tendencies for procrastination and memory loss and dampen the students’ academic performance
This is the headline causal-sounding claim. The design that supports it is a 'three-wave time-lagged design' in Study 2 (N = 494), which is observational/self-report panel data, not experimental. Time-lagged measurement helps with temporal ordering but does not rule out confounding (e.g., conscientiousness, prior performance, baseline study habits) or reverse causation in the substantive sense — students already prone to procrastination or already struggling academically may turn to ChatGPT. 'Memory loss' is a particularly strong construct to attribute to a usage scale over a few survey waves; the abstract gives no indication of an objective memory measure, so this is almost certainly self-reported perceived forgetting, not a validated cognitive measure. The verb 'develop tendencies' implies within-person change that a three-wave correlational design cannot cleanly establish.
CLAIM-002. students who were sensitive to rewards were less likely to use ChatGPT
This is flagged in the abstract itself as counterintuitive (contrasted with 'In contrast'), which raises the question of whether it is a robust finding or a measurement/specification artifact. 'Sensitivity to rewards' and 'sensitivity to quality' are treated as antecedents, but the abstract gives no construct definition; reward sensitivity often predicts more shortcut-seeking, not less, so a negative coefficient warrants caution about how the construct was operationalized and whether it survived multiple-comparison adjustment across the several antecedents tested.
CLAIM-003. developed and validated an eight-item scale to measure ChatGPT usage
Scale development on N = 165 (Study 1) with revalidation on N = 494 (Study 2) is a reasonable two-sample structure. However, 'ChatGPT usage' as a single eight-item self-report scale conflates frequency, intensity, type, and possibly attitudes; the abstract does not indicate whether usage is behaviorally anchored or purely perceptual. A self-report usage scale that then predicts self-reported procrastination and memory loss risks common-method variance inflating the downstream associations, especially if collected within the same survey instrument.
Scorecard
Sub-scores are 0–5 editorial judgements on fixed scales (higher is better, except methodological risk and overclaiming where higher is worse). They are contestable and open to a severity challenge from authors.
Design and identification
The study's backbone is a Study 1 scale-development sample (N = 165) followed by a Study 2 'three-wave time-lagged design' (N = 494). This is a sensible structure and the time-lagged element is a real improvement over single cross-sectional surveys because it can sequence the measurement of antecedents, usage, and outcomes. However, time-lagging measurement is not identification. Without randomization, an instrument, or strong controls (none mentioned in the abstract), the reported effects of usage on outcomes remain open to confounding by stable traits (e.g., conscientiousness, baseline academic ability) and to selection — students predisposed to procrastinate or to underperform may adopt ChatGPT at higher rates. The abstract's verbs ('develop', 'dampen') imply within-person causal change the design cannot establish.
Measurement and common-method variance
The outcomes — procrastination, 'memory loss', and academic performance — appear to be drawn alongside a self-reported usage scale. 'Memory loss' is the most concerning construct: nothing in the abstract indicates an objective cognitive task, so it is most plausibly perceived forgetting, which the same underlying dispositions could drive. If usage and the procrastination/memory outcomes are measured by self-report within the same instrument, common-method variance can inflate the associations. Academic performance could be self-reported (e.g., expected GPA) or registrar-based; the abstract does not say, and this distinction materially affects how much weight the performance finding can bear.
The counterintuitive antecedent
The abstract foregrounds that reward-sensitive students used ChatGPT less ('In contrast'). Because this runs against the intuitive expectation that reward sensitivity promotes shortcut-seeking, it deserves scrutiny: it could reflect the specific operationalization of 'sensitivity to rewards', suppression among correlated predictors, or a chance result among several antecedents (workload, time pressure, reward sensitivity, quality sensitivity) tested together. The abstract reports no multiple-comparison handling or robustness checks, so the direction and significance of this particular effect should be read cautiously pending the full text.
Framing and overclaiming
The title's 'harmful or helpful' framing and the abstract's 'Not surprisingly, use of ChatGPT was likely to develop tendencies for procrastination and memory loss' push correlational results toward a harm narrative. The phrase 'Not surprisingly' also signals confirmation-oriented interpretation. The indirect-effects claim ('academic workload, time pressure, and sensitivity to rewards had indirect effects on students' outcomes through ChatGPT usage') is standard mediation language, but mediation inferred from observational data inherits all the confounding concerns above and does not upgrade association to causation. Reframed as associations within an exploratory measurement study, the contribution is solid; as evidence of cognitive and academic harm, it overreaches.
Scope and reproducibility
Reproducibility cannot be fully judged from the abstract. Positives: two samples, explicit Ns, an articulated scale and design. Unstated: sampling frame and recruitment, attrition across the three waves (panel attrition can bias estimates), how academic performance was measured, whether data/scale items are shared, and what controls entered the models. Sample composition (single institution? discipline mix?) bounds generalizability of effects whose magnitudes are not reported in the abstract. These omissions are normal for an abstract but limit how far the stated conclusions can be relied upon without the full text.
Strongest critique
The central public-facing message — that using ChatGPT causes procrastination, memory loss, and lower academic performance — is built on a correlational, predominantly self-report panel design, yet the abstract states it in change/causal language ("develop tendencies", "dampen"). A three-wave time-lagged design establishes temporal sequencing of measurement but cannot, on its own, rule out confounding (conscientiousness, baseline ability, prior study habits) or selection: students who already procrastinate or already struggle may adopt ChatGPT more, producing exactly these correlations without any causal effect of the tool. The "memory loss" outcome is the sharpest concern — there is no indication of an objective cognitive measure, so it is almost certainly self-perceived forgetting, which can itself be driven by the same dispositions that drive both usage and procrastination. Combined with likely common-method variance (a self-reported usage scale predicting self-reported outcomes) and the absence of any mention of an experimental or instrumented identification strategy, the abstract's causal framing outruns what the design licenses.
Strongest fair defence
The paper is appropriately scoped as early descriptive and scale-development work on a fast-moving phenomenon, and it is methodologically more disciplined than much of the surrounding commentary: it uses two independent samples, formally develops and revalidates a measurement instrument before testing hypotheses, and adopts a three-wave time-lagged design rather than a single cross-sectional snapshot — a deliberate step to improve temporal ordering of antecedents and outcomes. The abstract is also candid about a counterintuitive result rather than smoothing it over. Within an educational-technology survey tradition, establishing a validated usage scale and mapping its plausible antecedents (workload, time pressure) and correlates (procrastination, performance) is a legitimate and useful contribution, and the indirect-effects framing is a reasonable way to organize the relationships rather than an assertion of definitive causation.
Conclusion
This is a competently structured early-stage survey study whose two-sample, scale-then-test design and three-wave time-lagged data collection are genuine strengths relative to typical cross-sectional work in this space. Its core empirical contribution — a validated eight-item ChatGPT-usage scale plus a map of plausible antecedents and correlates — is credible. The principal weakness is interpretive: the abstract frames correlational, largely self-report findings in causal/change language ("develop tendencies for procrastination and memory loss and dampen the students' academic performance"), where confounding, selection, and common-method variance remain live alternative explanations the design cannot exclude. The "memory loss" outcome is the most overreaching, given no indication of an objective measure. Treated as associational and exploratory, the claims are reasonable; treated as evidence that ChatGPT use harms cognition and grades, they are not yet warranted. Confidence is medium because this judgment rests on the abstract alone, which may omit robustness checks, controls, or measure details present in the full text.
Reply from the authors
Following the practice of Nature Matters Arising, Science Technical Comments and PNAS Letters, this Comment is published as one half of a Comment + Reply pair: the authors of the original article are invited to respond, and any reply is published here verbatim alongside the Comment as part of the record.
Reply: not yet invited. No reply has been received for publication.
The authors have a right of reply and no veto. A reply may request a factual correction, a methodological rebuttal, a clarification, a data/code update, or a severity challenge, and is published unedited. See the right-of-reply policy.
References
Every external source this Comment cites, each with a verified link. 0 fabricated.
Source-grounding attestation
- ✓Verbatim source spans present in the critique — 4/4 provenance spans re-derived in the critique prose
- ✓Passes the publication validator — no errors
- ✓Zero fabricated citations — 0 fabricated
- ✓Severity within the access-basis cap — severity "moderate" ≤ cap "moderate" for abstract_only
Every verbatim span the critique relies on is re-derived in the prose in-app; span-in-source is re-verifiable offline (the abstract is re-fetched, not stored, per the no-reproduce policy).
Re-verify span-in-source offline: python3 scripts/verify-queue-critiques.py
Independent faithfulness review
A refute-by-default adversarial panel (two independent reviewers — an overreach lens and a mischaracterization lens — that fetched the real source) tried to prove this critique misread the paper. This is an AI adversarial review recorded with its reasoning, not a deterministic check.
All five quoted strings are verbatim matches against the supplied abstract ("use of ChatGPT was likely to develop tendencies for procrastination and memory loss and dampen the students' academic performance"; "students who were sensitive to rewards were less likely to use ChatGPT"; "developed and validated an eight-item scale to measure ChatGPT usage"; "three-wave time-lagged design"; "In contrast"). The N values (165, 494) and the four antecedents/three outcomes are reported accurately. The critique invents no findings: every empirical attribution traces to abstract text. Its central charge — that the abstract states correlational, largely self-report results in causal/change language while a time-lagged design cannot exclude confounding, selection, or reverse causation — is methodologically sound and a fair reading. Inferences are properly hedged throughout ("almost certainly", "likely", "risks", "warrants caution"), and the critique distinguishes what the abstract says from what it infers (e.g., that "memory loss" is probably self-perceived, with the abstract giving "no indication of an objective measure"). Hedges and strengths are credited: it notes the paper's own "In contrast" flagging of the counterintuitive reward result, praises the two-sample scale-then-test structure and three-wave design as genuine strengths, and explicitly downgrades its own confidence to medium because it judges from the abstract alone. Both the overreach and mischaracterization lenses fail to sustain a refutation. Verdict: faithful.
Version & correction history
| Version | Date | Change |
|---|---|---|
| v1.0 | 2026-06-27 |
No silent substantive corrections — every change is versioned and visible.
How to cite this Comment
Critical AI. Comment on “Is it harmful or helpful? Examining the causes and consequences of generative AI usage among university students” (Muhammad Abbas et al., International Journal of Educational Technology in Higher Education, 2024). Critical AI; 2026. https://policywindow.org/critique/c/genai-usage-university-students-harmful-helpful
A registered DOI will replace the URL once minted; until then the canonical URL is the persistent identifier. Highwire/Dublin-Core citation tags and a schema.org Review record are embedded in this page for Google Scholar and reference managers.
Verify this Comment. Its checkable facts (target DOI, access-basis severity cap, zero fabricated citations) are served — as the app’s self-report — at /critique/api/critiques/genai-usage-university-students-harmful-helpful/verify; to confirm them independently of this site, re-derive the same checks (and resolve the target DOI) with npx tsx scripts/verify-critical-ai.ts --critique genai-usage-university-students-harmful-helpful --live.
Content fingerprint f052eb9ab4674aba (v1.0) — this Comment’s substantive content is content-addressed; a silent post-publication edit would change it.