Comment on "Exploring the acceptance of ChatGPT in higher education: a comprehensive quantitative study of university students and faculty"

Item: Exploring the acceptance of ChatGPT in higher education: a comprehensive quantitative study of university students and faculty
Author: Critical AI

Critical AI

Post-publication Comment · Critical AI

Comment on “Exploring the acceptance of ChatGPT in higher education: a comprehensive quantitative study of university students and faculty”

Critical AI · published 2026-07-01 · v1.0 · CRIT-000034

Concerning: Mehmet Haldun Kaya, Tufan Adıgüzel · Frontiers in Education · 2025-09-23

Severity: HighConfidence: HighTier exceptionOff-list venue · peer-reviewedOpen-access full textEmpiricalRead the paper ↗

Why this paper was selected

Autonomous production cycle (education deepening); OA full-text critique via single-stage produce-tight (two-stage sharpen stubbed on the 93K-char text) + 3-lens convergence gate.

AI/AGI centrality 3/5 · societal relevance 3/5 · source-journal note: Off-monitored: Frontiers in Education is a peer-reviewed, gold open-access (CC BY) journal not in the journal's monitored top-tier list; critiqued from its verbatim open-access full text.

Summary

This paper surveys 351 students and faculty at one Eastern-European university and uses UTAUT2/PLS-SEM to model what drives intention to use ChatGPT. Its methods (reliability, validity, HTMT, VIF, model fit) are competently reported, and its structural results are mostly credible. The serious problem is that the abstract's headline finding contradicts the paper's own numbers: it claims effort expectancy and performance expectancy "were the most significant predictors," yet Table 10 and the Discussion both show effort expectancy is non-significant and Habit is by far the strongest predictor. Secondary issues are a causal over-claim about why faculty score higher, an over-reach on generalizability from a convenience sample, and a mismatch between the abstract's sample count and the actually analyzed sample.

Central claims & evidence map

Claim	Evidence offered	Support	Overclaiming	Main weakness
The abstract's headline result directly contradicts the paper's own estimates. It reports that 'effort expectancy and performance expectancy were the most significant predictors of behavioral intention,' but Table 10 gives effort expectancy β = 0.01 (p = 0.08, students) and β = 0	The findings reveal that effort expectancy and performance expectancy were the most significant predictors of behavioral intention to use ChatGPT.	Weak	Major	The abstract's headline result directly contradicts the paper's own estimates. It reports that 'effort expectancy and performance expectancy were the most significant predictors of behavioral intention,' but Table 10 gives effort expectancy β = 0.01 (p = 0.08, students) and β = 0
The abstract attributes the observed faculty-vs-student gap in intention to a specific cause — greater technology experience — that the study never measured or tested. 'Experience with technology' is not a construct in the model, no covariate captures it, and no formal significan	Faculty members demonstrated a higher intention to use ChatGPT compared to students, likely due to their greater experience with technology.	Moderate	Moderate	The abstract attributes the observed faculty-vs-student gap in intention to a specific cause — greater technology experience — that the study never measured or tested. 'Experience with technology' is not a construct in the model, no covariate captures it, and no formal significan
The paper claims its single-institution convenience sample gains generalizability from national diversity of the student body. Nationality heterogeneity within one university does not repair the core external-validity threat of non-probability, single-site convenience sampling: t	Moreover, the student body at the university represents diverse nationalities, which enhances the generalizability of the findings beyond a single cultural context.	Moderate	Moderate	The paper claims its single-institution convenience sample gains generalizability from national diversity of the student body. Nationality heterogeneity within one university does not repair the core external-validity threat of non-probability, single-site convenience sampling: t
The abstract reports the sample as '378 participants, including 346 students and 32 faculty members,' but this is the raw pre-screening count and does not describe the sample actually analyzed. The Methods state screening reduced the set to 351, and Table 3 shows the analyzed com	Data were collected through a survey of 378 participants, including 346 students and 32 faculty members, from various faculties at a university in Eastern Europe.	Moderate	Minor	The abstract reports the sample as '378 participants, including 346 students and 32 faculty members,' but this is the raw pre-screening count and does not describe the sample actually analyzed. The Methods state screening reduced the set to 351, and Table 3 shows the analyzed com

Per-claim assessment

CLAIM-001. The abstract's headline result directly contradicts the paper's own estimates. It reports that 'effort expectancy and performance expectancy were the most significant predictors of behavioral intention,' but Table 10 gives effort expectancy β = 0.01 (p = 0.08, students) and β = 0
Table 10 reports Effort expectancy -> behavioral intention path coefficients of 0.01 (students) and 0.07 (instructors) with p = 0.08 and p = 0.69; the Discussion states 'the effect of Effort Expectancy on Behavioral Intention is minimal and insignificant in both groups'; Habit is the largest coefficient ('In contrast, Habit proves to be a key predictor, showing strong coefficients in both groups, particularly for students (β = 0.49, p < 0.01)'). The Conclusion likewise states 'effort expectancy did not show a significant effect on usage intentions in our analysis.'
CLAIM-002. The abstract attributes the observed faculty-vs-student gap in intention to a specific cause — greater technology experience — that the study never measured or tested. 'Experience with technology' is not a construct in the model, no covariate captures it, and no formal significan
The analyzed instructor subsample is 31 (Table 3: 'Total 143 177 16 15 351', i.e., 16+15 instructors); no experience variable appears among the measured UTAUT2 constructs, and no between-group inferential test of the BI difference is reported — only descriptive means (Table 7: BI M = 4.151 students vs 5.129 instructors).
CLAIM-003. The paper claims its single-institution convenience sample gains generalizability from national diversity of the student body. Nationality heterogeneity within one university does not repair the core external-validity threat of non-probability, single-site convenience sampling: t
The design is explicitly convenience sampling at one 'small university in Eastern Europe' ('convenience sampling was performed to gather responses from those who were readily available and willing'), and the Conclusion concedes 'Reliance on convenience sampling may introduce bias, restricting generalizability.'
CLAIM-004. The abstract reports the sample as '378 participants, including 346 students and 32 faculty members,' but this is the raw pre-screening count and does not describe the sample actually analyzed. The Methods state screening reduced the set to 351, and Table 3 shows the analyzed com
Methods: 'After these steps, a final sample of 351 participants was retained for analysis'; Table 3 totals '143 177 16 15 351' give 320 students and 31 instructors — inconsistent with the abstract's 346 students / 32 faculty and with the 91.2%/8.8% split cited in the text.

Scorecard

AI/AGI contribution3.0 / 5

Evidentiary support2.0 / 5

Methodological risk4.0 / 5

Overclaiming4.0 / 5

Reproducibility / auditability3.0 / 5

Societal-impact relevance3.0 / 5

Sub-scores are 0–5 editorial judgements on fixed scales (higher is better, except methodological risk and overclaiming where higher is worse). They are contestable and open to a severity challenge from authors.

Strongest critique

The abstract's headline claim is contradicted by the paper's own results. It states "effort expectancy and performance expectancy were the most significant predictors of behavioral intention to use ChatGPT," yet Table 10 reports effort expectancy path coefficients of β = 0.01 (p = 0.08) for students and β = 0.07 (p = 0.69) for instructors — non-significant in both groups — and the Discussion explicitly concedes "the effect of Effort Expectancy on Behavioral Intention is minimal and insignificant in both groups," with the Conclusion agreeing "effort expectancy did not show a significant effect on usage intentions." The construct the model actually identifies as dominant is Habit (β = 0.49/0.47, "the strongest predictor"), which the abstract never names as a leading driver. The most plausible source of the error is that the authors conflated effort expectancy's high *descriptive mean* ("Effort Expectancy received the highest scores") with predictive path significance. The consequence is that the single sentence most readers will take away — which constructs drive ChatGPT acceptance — is inverted relative to the study's own inferential findings, and no adversarial reading of the full text can reconcile a "most significant predictor" label with a p = 0.08 / p = 0.69 non-significant path.

Strongest fair defence

The underlying empirical work is competent and much of it is reported to a good standard. Reliability and validity are thoroughly documented (Cronbach's alpha and composite reliability above 0.70, AVE above 0.50, Fornell–Larcker and HTMT below 0.90 for discriminant validity), collinearity is checked (all VIFs below 3.3 and 5), and model fit is reported (SRMR 0.06/0.07, positive Q² of 0.25/0.22, R² = 0.79 for behavioral intention). The structural findings themselves — Habit as the dominant predictor and effort expectancy as non-significant — are internally consistent between Table 10, the effect-size (f²) analysis, the Discussion, and the Conclusion; only the abstract's summary sentence is discordant, so the defect is a reporting/synthesis error rather than a modeling error. The convenience-sampling limitation is explicitly disclosed, separate student and instructor analyses are run, and the paper's comparative student-vs-faculty framing is a genuine contribution over student-only prior work. My critiques target specific over-claims (the abstract's predictor labeling, the causal gloss on the faculty gap, the generalizability phrasing, the sample-count mismatch), not the existence or competence of the estimates.

Conclusion

Publishable core with a serious, fixable framing defect. The PLS-SEM estimation and its validity diagnostics are sound and internally consistent, and the student-vs-faculty comparison is a real contribution. But the abstract mis-reports the paper's own primary result — labeling a non-significant path (effort expectancy, p = 0.08/0.69) as a "most significant predictor" while omitting the genuinely dominant construct (Habit) — which is the load-bearing finding readers will cite and is directly refuted by Table 10 and the Discussion. This should be corrected before the abstract is relied upon. The secondary flaws (an untested causal gloss on the ~31-person faculty subgroup, an over-reach on generalizability from a single-site convenience sample that contradicts the paper's own limitation, and a mismatch between the abstract's 378/346/32 headcount and the analyzed 351/320/31 sample) are moderate-to-minor and are largely matters of over-claim and inconsistent reporting rather than analytic error.

Reply from the authors

Following the practice of Nature Matters Arising, Science Technical Comments and PNAS Letters, this Comment is published as one half of a Comment + Reply pair: the authors of the original article are invited to respond, and any reply is published here verbatim alongside the Comment as part of the record.

Reply: not yet invited. No reply has been received for publication.

The authors have a right of reply and no veto. A reply may request a factual correction, a methodological rebuttal, a clarification, a data/code update, or a severity challenge, and is published unedited. See the right-of-reply policy.

Source-grounding attestation

✓ attested in-appgrounding: spans in app

✓Verbatim source spans present in the critique — 4/4 provenance spans re-derived in the critique prose
✓Passes the publication validator — no errors
✓Zero fabricated citations — 0 fabricated
✓Severity within the access-basis cap — severity "high" ≤ cap "high" for open_access

Every verbatim span the critique relies on is re-derived in the prose in-app; span-in-source is re-verifiable offline (the abstract is re-fetched, not stored, per the no-reproduce policy).

Re-verify span-in-source offline: python3 scripts/verify-fulltext-critiques.py

Version & correction history

Version	Date	Change
v1.0	2026-07-01

No silent substantive corrections — every change is versioned and visible.

How to cite this Comment

Critical AI. Comment on “Exploring the acceptance of ChatGPT in higher education: a comprehensive quantitative study of university students and faculty” (Mehmet Haldun Kaya et al., Frontiers in Education, 2025). Critical AI; 2026. https://policywindow.org/critique/c/chatgpt-acceptance-higher-education-utaut2

A registered DOI will replace the URL once minted; until then the canonical URL is the persistent identifier. Highwire/Dublin-Core citation tags and a schema.org Review record are embedded in this page for Google Scholar and reference managers.

Verify this Comment. Its checkable facts (target DOI, access-basis severity cap, zero fabricated citations) are served — as the app’s self-report — at /critique/api/critiques/chatgpt-acceptance-higher-education-utaut2/verify; to confirm them independently of this site, re-derive the same checks (and resolve the target DOI) with npx tsx scripts/verify-critical-ai.ts --critique chatgpt-acceptance-higher-education-utaut2 --live.

Content fingerprint 09be8dd977352d35 (v1.0) — this Comment’s substantive content is content-addressed; a silent post-publication edit would change it.