{"$schema":"https://policywindow.org/critique/api/schema","critique_id":"CRIT-000018","slug":"genai-usage-university-students-harmful-helpful","url":"https://policywindow.org/critique/c/genai-usage-university-students-harmful-helpful","doi":null,"status":"published","critique_type":"editorially_approved_ai_native_critique","publication_date":"2026-06-27","current_version":"1.0","target_paper":{"title":"Is it harmful or helpful? Examining the causes and consequences of generative AI usage among university students","authors":["Muhammad Abbas","Farooq Ahmed Jam","Tariq Iqbal Khan"],"journal":"International Journal of Educational Technology in Higher Education","doi":"10.1186/s41239-024-00444-7","url":"https://doi.org/10.1186/s41239-024-00444-7","publicationDate":"2024-02-16","paperType":"empirical","accessBasis":"abstract_only","fullTextUsed":false,"fictional":false,"doi_url":"https://doi.org/10.1186/s41239-024-00444-7"},"source_journal":{"tier":"exception","rankingSources":["off-monitored: peer-reviewed education venue not in the monitored determination; disclosed off-list"],"rankingNote":"Off-monitored: International Journal of Educational Technology in Higher Education is a peer-reviewed education journal not in the monitored determination; disclosed off-list."},"selection_provenance":{"id":"genai-usage-university-students-harmful-helpful","venue":"International Journal of Educational Technology in Higher Education","inMonitoredSet":false,"determinedTier":null,"recordedTier":"exception","effectiveTier":"exception","kind":"off_list","disclosed":true,"offListPeerReviewed":true},"selection":{"aiAgiCentralityScore":2,"societalRelevanceScore":4,"aiAgiCategories":[],"selectionReason":"G88b coverage-moving batch (education white-space); validated G84 engine; span-grounded to the OpenAlex abstract."},"scores":{"aiAgiContribution":2,"evidentiarySupport":3,"methodologicalRisk":3,"overclaiming":3,"reproducibilityOrAuditability":3,"societalImpactRelevance":4,"severity":"moderate","confidence":"medium"},"severity_cap_for_access_basis":"moderate","plain_language_summary":"Researchers ran two surveys of university students to study why they use ChatGPT and what happens when they do. They first built and tested an eight-item questionnaire to measure ChatGPT use (165 students), then ran a larger study (494 students) at three time points. They report that heavier academic workload and time pressure predicted more ChatGPT use, that more ChatGPT use was linked to more procrastination, more reported memory problems, and lower grades, and (surprisingly) that students more sensitive to rewards used it less. The study design is better than a one-time survey because it spaces measurements over time, but it is still based largely on students' self-reports and cannot prove that ChatGPT causes these problems — students who already procrastinate or struggle may simply use it more. The most cautious reading is that the study shows associations and a useful new measurement tool, not proof of harm.","claims":[{"id":"CLAIM-001","text":"use of ChatGPT was likely to develop tendencies for procrastination and memory loss and dampen the students’ academic performance","type":"empirical","evidenceOffered":"use of ChatGPT was likely to develop tendencies for procrastination and memory loss and dampen the students’ academic performance","support":"weak","overclaiming":"moderate","assessment":"This is the headline causal-sounding claim. The design that supports it is a 'three-wave time-lagged design' in Study 2 (N = 494), which is observational/self-report panel data, not experimental. Time-lagged measurement helps with temporal ordering but does not rule out confounding (e.g., conscientiousness, prior performance, baseline study habits) or reverse causation in the substantive sense — students already prone to procrastination or already struggling academically may turn to ChatGPT. 'Memory loss' is a particularly strong construct to attribute to a usage scale over a few survey waves; the abstract gives no indication of an objective memory measure, so this is almost certainly self-reported perceived forgetting, not a validated cognitive measure. The verb 'develop tendencies' implies within-person change that a three-wave correlational design cannot cleanly establish.","mainWeakness":"Causal/quasi-causal language ('develop', 'dampen') rests on time-lagged self-report correlations that cannot exclude confounding or selection; 'memory loss' is an unusually strong, likely self-perceived outcome.","confidence":"medium"},{"id":"CLAIM-002","text":"students who were sensitive to rewards were less likely to use ChatGPT","type":"empirical","evidenceOffered":"students who were sensitive to rewards were less likely to use ChatGPT","support":"weak","overclaiming":"minor","assessment":"This is flagged in the abstract itself as counterintuitive (contrasted with 'In contrast'), which raises the question of whether it is a robust finding or a measurement/specification artifact. 'Sensitivity to rewards' and 'sensitivity to quality' are treated as antecedents, but the abstract gives no construct definition; reward sensitivity often predicts more shortcut-seeking, not less, so a negative coefficient warrants caution about how the construct was operationalized and whether it survived multiple-comparison adjustment across the several antecedents tested.","mainWeakness":"A counterintuitive directional effect from an undefined trait construct, reported without indication of robustness checks or correction across multiple simultaneous antecedents.","confidence":"medium"},{"id":"CLAIM-003","text":"developed and validated an eight-item scale to measure ChatGPT usage","type":"empirical","evidenceOffered":"developed and validated an eight-item scale to measure ChatGPT usage","support":"moderate","overclaiming":"minor","assessment":"Scale development on N = 165 (Study 1) with revalidation on N = 494 (Study 2) is a reasonable two-sample structure. However, 'ChatGPT usage' as a single eight-item self-report scale conflates frequency, intensity, type, and possibly attitudes; the abstract does not indicate whether usage is behaviorally anchored or purely perceptual. A self-report usage scale that then predicts self-reported procrastination and memory loss risks common-method variance inflating the downstream associations, especially if collected within the same survey instrument.","mainWeakness":"Self-report usage scale predicting self-report outcomes invites common-method bias; 'usage' construct may bundle heterogeneous behaviors under one measure.","confidence":"medium"}],"sections":[{"id":"s1","title":"Design and identification","body":"The study's backbone is a Study 1 scale-development sample (N = 165) followed by a Study 2 'three-wave time-lagged design' (N = 494). This is a sensible structure and the time-lagged element is a real improvement over single cross-sectional surveys because it can sequence the measurement of antecedents, usage, and outcomes. However, time-lagging measurement is not identification. Without randomization, an instrument, or strong controls (none mentioned in the abstract), the reported effects of usage on outcomes remain open to confounding by stable traits (e.g., conscientiousness, baseline academic ability) and to selection — students predisposed to procrastinate or to underperform may adopt ChatGPT at higher rates. The abstract's verbs ('develop', 'dampen') imply within-person causal change the design cannot establish."},{"id":"s2","title":"Measurement and common-method variance","body":"The outcomes — procrastination, 'memory loss', and academic performance — appear to be drawn alongside a self-reported usage scale. 'Memory loss' is the most concerning construct: nothing in the abstract indicates an objective cognitive task, so it is most plausibly perceived forgetting, which the same underlying dispositions could drive. If usage and the procrastination/memory outcomes are measured by self-report within the same instrument, common-method variance can inflate the associations. Academic performance could be self-reported (e.g., expected GPA) or registrar-based; the abstract does not say, and this distinction materially affects how much weight the performance finding can bear."},{"id":"s3","title":"The counterintuitive antecedent","body":"The abstract foregrounds that reward-sensitive students used ChatGPT less ('In contrast'). Because this runs against the intuitive expectation that reward sensitivity promotes shortcut-seeking, it deserves scrutiny: it could reflect the specific operationalization of 'sensitivity to rewards', suppression among correlated predictors, or a chance result among several antecedents (workload, time pressure, reward sensitivity, quality sensitivity) tested together. The abstract reports no multiple-comparison handling or robustness checks, so the direction and significance of this particular effect should be read cautiously pending the full text."},{"id":"s4","title":"Framing and overclaiming","body":"The title's 'harmful or helpful' framing and the abstract's 'Not surprisingly, use of ChatGPT was likely to develop tendencies for procrastination and memory loss' push correlational results toward a harm narrative. The phrase 'Not surprisingly' also signals confirmation-oriented interpretation. The indirect-effects claim ('academic workload, time pressure, and sensitivity to rewards had indirect effects on students' outcomes through ChatGPT usage') is standard mediation language, but mediation inferred from observational data inherits all the confounding concerns above and does not upgrade association to causation. Reframed as associations within an exploratory measurement study, the contribution is solid; as evidence of cognitive and academic harm, it overreaches."},{"id":"s5","title":"Scope and reproducibility","body":"Reproducibility cannot be fully judged from the abstract. Positives: two samples, explicit Ns, an articulated scale and design. Unstated: sampling frame and recruitment, attrition across the three waves (panel attrition can bias estimates), how academic performance was measured, whether data/scale items are shared, and what controls entered the models. Sample composition (single institution? discipline mix?) bounds generalizability of effects whose magnitudes are not reported in the abstract. These omissions are normal for an abstract but limit how far the stated conclusions can be relied upon without the full text."}],"strongest_critique":"The central public-facing message — that using ChatGPT causes procrastination, memory loss, and lower academic performance — is built on a correlational, predominantly self-report panel design, yet the abstract states it in change/causal language (\"develop tendencies\", \"dampen\"). A three-wave time-lagged design establishes temporal sequencing of measurement but cannot, on its own, rule out confounding (conscientiousness, baseline ability, prior study habits) or selection: students who already procrastinate or already struggle may adopt ChatGPT more, producing exactly these correlations without any causal effect of the tool. The \"memory loss\" outcome is the sharpest concern — there is no indication of an objective cognitive measure, so it is almost certainly self-perceived forgetting, which can itself be driven by the same dispositions that drive both usage and procrastination. Combined with likely common-method variance (a self-reported usage scale predicting self-reported outcomes) and the absence of any mention of an experimental or instrumented identification strategy, the abstract's causal framing outruns what the design licenses.","strongest_fair_defence":"The paper is appropriately scoped as early descriptive and scale-development work on a fast-moving phenomenon, and it is methodologically more disciplined than much of the surrounding commentary: it uses two independent samples, formally develops and revalidates a measurement instrument before testing hypotheses, and adopts a three-wave time-lagged design rather than a single cross-sectional snapshot — a deliberate step to improve temporal ordering of antecedents and outcomes. The abstract is also candid about a counterintuitive result rather than smoothing it over. Within an educational-technology survey tradition, establishing a validated usage scale and mapping its plausible antecedents (workload, time pressure) and correlates (procrastination, performance) is a legitimate and useful contribution, and the indirect-effects framing is a reasonable way to organize the relationships rather than an assertion of definitive causation.","final_judgment":"This is a competently structured early-stage survey study whose two-sample, scale-then-test design and three-wave time-lagged data collection are genuine strengths relative to typical cross-sectional work in this space. Its core empirical contribution — a validated eight-item ChatGPT-usage scale plus a map of plausible antecedents and correlates — is credible. The principal weakness is interpretive: the abstract frames correlational, largely self-report findings in causal/change language (\"develop tendencies for procrastination and memory loss and dampen the students' academic performance\"), where confounding, selection, and common-method variance remain live alternative explanations the design cannot exclude. The \"memory loss\" outcome is the most overreaching, given no indication of an objective measure. Treated as associational and exploratory, the claims are reasonable; treated as evidence that ChatGPT use harms cognition and grades, they are not yet warranted. Confidence is medium because this judgment rests on the abstract alone, which may omit robustness checks, controls, or measure details present in the full text.","review_process":{"aiAgentsUsed":["AGISS critique engine (validated G84 directive)"],"reviewRounds":1,"humanEditor":{"name":"","role":"","approvalDate":"","declaredConflict":"none"},"expertCertification":{"used":false}},"author_response":{"notified":false,"status":"not_yet_invited"},"versions":[{"version":"1.0","date":"2026-06-27","note":"","changeType":"initial"}],"transparency":{"modelCardUrl":"/critique/model-card","publicAuditSummary":"G88b coverage-moving batch (education white-space); validated G84 engine; span-grounded to the OpenAlex abstract.","privateAuditRecordExists":true,"citationVerification":{"status":"complete","checkedSources":[{"label":"International Journal of Educational Technology in Higher Education abstract (OpenAlex)","url":"https://doi.org/10.1186/s41239-024-00444-7","verified":true}],"fabricatedCitations":0},"riskReview":{"copyright":"completed","defamation":"completed","note":"Critiques claims and methods only; no author-motive language. Abstract-only; severity capped to moderate; fair-use of short abstract spans."}}}