Post-publication Comment · Critical AI
Comment on “Whether and When Could Generative AI Improve College Student Learning Engagement?”
Critical AI · published 2026-07-04 · v1.0 · CRIT-000038
Concerning: Fei Guo, Lanwen Zhang, Tianle Shi, Hamish Coates · Behavioral Sciences · 2025
Why this paper was selected
Autonomous production cycle (education deepening); OA full-text critique via two-stage produce+sharpen + 3-lens convergence gate (unanimous survives).
AI/AGI centrality 3/5 · societal relevance 4/5 · source-journal note: Off-monitored: Behavioral Sciences (MDPI) is a peer-reviewed, gold open-access (CC BY 4.0) journal not in the journal's monitored top-tier list; critiqued from its verbatim open-access full text via PMC (PMC12382942).
Summary
A large-scale cross-sectional survey (N=72,615) of Chinese undergraduates finds that GenAI use is positively associated with cognitive and emotional engagement but negatively associated with behavioral engagement and learning motivation, with effects varying across four learning-context types. The central critique is that the discussion deploys causal language -- asserting GenAI 'has replaced' traditional learning activities -- despite a cross-sectional OLS design that cannot support directional claims, directly contradicting the paper's own limitations caveat. Additional concerns include a missing-flag method applied to structurally non-random missing data (35-37% of GenAI variables), common-method variance from same-respondent same-occasion measurement, and CFA validation results withheld ('available upon request') with no deposited data or code.
Central claims & evidence map
| Claim | Type | Evidence offered | Support | Overclaiming | Main weakness |
|---|---|---|---|---|---|
| The discussion asserts GenAI 'has replaced' traditional learning activities despite a cross-sectional OLS design that cannot establish causal substitution. | Causal | GenAI use has replaced activities such as previewing, note-taking, and reviewing in traditional learning | Weak | Major | Cross-sectional OLS with simultaneous self-report of predictor and outcome cannot distinguish whether GenAI displaced traditional activities or whether students who already did fewer such activities adopted GenAI. |
| The missing-flag method is used for structurally non-random missing data without discussing its known bias properties under non-MCAR conditions. | Methodological | The missing flag method was employed to handle missing data in the independent variables | Moderate | Moderate | Panel 2 regression coefficients may be biased in an unknown direction due to systematic, non-random missingness handled by a method requiring the MCAR assumption. |
| The four-quadrant learning-context classification is constructed from individual students' perceptions rather than objective course-level measures, creating common-method variance. | Descriptive | Using students’ ratings of academic challenge and support from faculty members, we construct a categorical variable to describe the learning context | Moderate | Moderate | Same-respondent, same-occasion measurement of predictor, outcome, and moderator, with the moderator conflating individual perception with course properties. |
| CFA validation results are withheld ('available upon request') with no factor loadings, fit indices, or deposited data/code reported. | Methodological | We used Confirmatory Factor Analysis (CFA) to examine the structural validity of the indicators. All indicators performed well. The results were reported in the CCSS 2024 Handbook and are available upon request. | Moderate | Minor | Readers cannot independently verify the structural validity claims for the measurement scales without access to the CFA results, item wordings, or analysis code. |
Per-claim assessment
CLAIM-001. The discussion asserts GenAI 'has replaced' traditional learning activities despite a cross-sectional OLS design that cannot establish causal substitution.
The verb 'replaced' asserts a causal substitution process, yet the single-wave cross-sectional design provides only contemporaneous correlations. The paper's own theory section states 'the three RQs are all descriptive in nature' and the limitations acknowledge 'the findings of the current study could not be interpreted as causal relationships,' yet section 4.2 is titled 'The Impacts of GenAI Use on Student Engagement' and the discussion uses causal framing that materially exceeds the hedging.
CLAIM-002. The missing-flag method is used for structurally non-random missing data without discussing its known bias properties under non-MCAR conditions.
The GenAI-specific variables have missing rates of 35-37% because non-users were not asked these items. The missingness is structurally determined by GenAI adoption -- itself correlated with covariates such as institution type and major, and likely with engagement outcomes. The missing-flag method yields unbiased estimates only under MCAR, which is implausible here. The paper does not discuss this limitation or test robustness with alternative approaches.
CLAIM-003. The four-quadrant learning-context classification is constructed from individual students' perceptions rather than objective course-level measures, creating common-method variance.
Both the independent variables (GenAI frequency and satisfaction) and all dependent variables (behavioral, cognitive, emotional engagement) are self-reported by the same respondent in the same survey instrument at the same time. The four-quadrant learning-context typology is built from individual students' perceptions rather than from objective course-level data, meaning the moderator reflects individual disposition rather than actual instructional characteristics. The paper controls for social desirability but does not address common-method bias more broadly.
CLAIM-004. CFA validation results are withheld ('available upon request') with no factor loadings, fit indices, or deposited data/code reported.
Key psychometric evidence -- item wordings, CFA factor loadings, model fit indices -- is not disclosed in the paper or supplementary materials. 'Available upon request' is a weaker reproducibility standard than open data/code. The claim that 'all indicators performed well' cannot be independently verified.
Scorecard
Sub-scores are 0–5 editorial judgements on fixed scales (higher is better, except methodological risk and overclaiming where higher is worse). They are contestable and open to a severity challenge from authors.
Strongest critique
The paper's single most defensible flaw is the causal language in the discussion that directly contradicts the study's own acknowledged limitations. Despite stating that 'the findings of the current study could not be interpreted as causal relationships,' the discussion asserts that 'GenAI use has replaced activities such as previewing, note-taking, and reviewing in traditional learning' -- a directional causal substitution claim that the cross-sectional OLS design cannot support. This is not stray language but a substantive interpretive claim that shapes the paper's proposed 'conditional effectiveness model.'
Strongest fair defence
The study contributes genuine value through its scale (N=72,615 across 25 institutions), its multidimensional engagement framework, and its context-sensitive moderation analysis that yields nuanced, non-uniform findings across four learning environments. The authors appropriately cluster standard errors at the institution level, control for social desirability, and explicitly acknowledge in their limitations that results 'could not be interpreted as causal relationships.' The theory section also correctly frames all three research questions as 'descriptive in nature.' The overclaiming is bounded to specific passages in the discussion rather than pervading the entire analytical framework.
Conclusion
This is a well-scaled descriptive study that provides useful preliminary evidence on GenAI-engagement associations across learning contexts in Chinese higher education. The core analytical findings -- nuanced, context-dependent, and often mixed -- are informative. However, the causal language in the discussion ('has replaced,' 'Impacts') overstates what the cross-sectional design can establish, and the missing-flag method for structurally non-random missing data introduces unacknowledged bias risk. These are bounded overclaims on an otherwise competent survey study.
Reply from the authors
Following the practice of Nature Matters Arising, Science Technical Comments and PNAS Letters, this Comment is published as one half of a Comment + Reply pair: the authors of the original article are invited to respond, and any reply is published here verbatim alongside the Comment as part of the record.
Reply: not yet invited. No reply has been received for publication.
The authors have a right of reply and no veto. A reply may request a factual correction, a methodological rebuttal, a clarification, a data/code update, or a severity challenge, and is published unedited. See the right-of-reply policy.
Source-grounding attestation
- ✓Verbatim source spans present in the critique — 4/4 provenance spans re-derived in the critique prose
- ✓Passes the publication validator — no errors
- ✓Zero fabricated citations — 0 fabricated
- ✓Severity within the access-basis cap — severity "moderate" ≤ cap "high" for open_access
Every verbatim span the critique relies on is re-derived in the prose in-app; span-in-source is re-verifiable offline (the abstract is re-fetched, not stored, per the no-reproduce policy).
Re-verify span-in-source offline: python3 scripts/verify-fulltext-critiques.py
Version & correction history
| Version | Date | Change |
|---|---|---|
| v1.0 | 2026-07-04 |
No silent substantive corrections — every change is versioned and visible.
How to cite this Comment
Critical AI. Comment on “Whether and When Could Generative AI Improve College Student Learning Engagement?” (Fei Guo et al., Behavioral Sciences, 2025). Critical AI; 2026. https://policywindow.org/critique/c/genai-college-student-learning-engagement
A registered DOI will replace the URL once minted; until then the canonical URL is the persistent identifier. Highwire/Dublin-Core citation tags and a schema.org Review record are embedded in this page for Google Scholar and reference managers.
Verify this Comment. Its checkable facts (target DOI, access-basis severity cap, zero fabricated citations) are served — as the app’s self-report — at /critique/api/critiques/genai-college-student-learning-engagement/verify; to confirm them independently of this site, re-derive the same checks (and resolve the target DOI) with npx tsx scripts/verify-critical-ai.ts --critique genai-college-student-learning-engagement --live.
Content fingerprint d65eebdd782e17df (v1.0) — this Comment’s substantive content is content-addressed; a silent post-publication edit would change it.