Calibration

Critical AI against the human standard

A critique journal should be able to show how its critiques compare to the established human standard — not just assert quality. The benchmark corpus of real published Comments, replications and reanalyses defines that standard: which analytical lenses expert critiques emphasise, and how broad they tend to be. Each Critical AI critique is then scored on the same dimension vocabulary — dimensional alignment, breadth, the credibility gates the benchmarks embody, and claim-grounding. Every number re-derives in-app from the corpus and the critiques.

Calibration tracks access basis, by design. A critique read at full open-access text can reach the methods/statistics/identification lenses experts emphasise, and so can be calibrated; an abstract-only critique engages framing and stated claims only, scores lower, and reads needs review— not a defect, but an honest signal that full-text review is needed to reach the standard. The metric thus validates the journal’s access-basis severity caps.

reference: 68 human critiques63 Critical AI critiques scored · 17 calibratedmean alignment 0.72 (threshold 0.80)JSON ↗

The human standard

Share of the 68 benchmark critiques that exercise each lens. The most-emphasised lenses (methods, statistics, claim–evidence fit, reproducibility) define what a strong critique attends to; alignment rewards a critique for covering them. A typical published critique spans 4–5 dimensions (median 5, range 3–6).

Methods / design
91%
Claim–evidence fit
91%
Statistics / inference
72%
Reproducibility
57%
Overclaiming
46%
Generalisation
44%
Causal identification
37%
Data & code
25%
Theory / framing
16%
Novelty / contribution
4%

The standard differs by field

Different social sciences critique differently, so a critique is also scored against the standard of its own field — not just the pooled average. Each domain below has its own emphasis profile, derived from its benchmark critiques (domains with fewer than 3 benchmarks fall back to the pooled standard).

Domain	n	Most-emphasised lenses
Economics & finance	14	Methods / design 93% · Claim–evidence fit 93% · Statistics / inference 86% · Reproducibility 71%
Political science	9	Statistics / inference 89% · Methods / design 78% · Claim–evidence fit 78% · Reproducibility 78%
Psychology	7	Methods / design 100% · Statistics / inference 100% · Claim–evidence fit 100% · Reproducibility 100%
Sociology	6	Methods / design 100% · Claim–evidence fit 100% · Statistics / inference 67% · Reproducibility 67%
Public policy & criminology	6	Methods / design 100% · Claim–evidence fit 100% · Statistics / inference 67% · Overclaiming 50%
Communication & media	4	Methods / design 100% · Claim–evidence fit 100% · Generalisation 75% · Statistics / inference 50%
Education	4	Claim–evidence fit 100% · Overclaiming 75% · Generalisation 75% · Methods / design 50%
Management, IS & marketing	3	Methods / design 100% · Causal identification 100% · Statistics / inference 100% · Reproducibility 67%

The contrast is real: management critiques lean on causal identification, education on generalisation, and psychology and political methodology on statistics and reproducibility (the replication tradition).

Critical AI critiques, scored

✓ calibratedOpen-access full textCRIT-000013
The Impact of AI on Developer Productivity: Evidence from GitHub Copilot
Alignment
0.94
cosine vs the emphasis profile (≥0.80)
Breadth
7 · Comprehensive
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Economics & finance — alignment against economics & finance’s own critique standard 0.87 (pooled 0.94)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
attends to the expert-emphasised lenses (alignment 0.94); broader than a typical Comment (7 dimensions vs human median 5).
✓ calibratedOpen-access full textCRIT-000002
Generative AI at Work
Alignment
0.92
cosine vs the emphasis profile (≥0.80)
Breadth
8 · Comprehensive
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Economics & finance — alignment against economics & finance’s own critique standard 0.90 (pooled 0.92)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
attends to the expert-emphasised lenses (alignment 0.92); broader than a typical Comment (8 dimensions vs human median 5).
✓ calibratedOpen-access full textCRIT-000014
Scaffolding Human–AI Collaboration: A Field Experiment on Behavioral Protocols and Cognitive Reframing
Alignment
0.91
cosine vs the emphasis profile (≥0.80)
Breadth
6 · Comprehensive
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Management, IS & marketing — alignment against management, is & marketing’s own critique standard 0.92 (pooled 0.91)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
attends to the expert-emphasised lenses (alignment 0.91); broader than a typical Comment (6 dimensions vs human median 5).
✓ calibratedAbstract onlyCRIT-GEN-when-influencers-delegat
When Influencers Delegate Replies: How Social AI Agents Shape User Engagement
Alignment
0.91
cosine vs the emphasis profile (≥0.80)
Breadth
6 · Comprehensive
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Management, IS & marketing — alignment against management, is & marketing’s own critique standard 0.92 (pooled 0.91)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
attends to the expert-emphasised lenses (alignment 0.91); broader than a typical Comment (6 dimensions vs human median 5).
✓ calibratedUser-supplied full textCRIT-GEN-more-versus-better-artif
More Versus Better: Artificial Intelligence, Incentives, and the Emerging Crisis in Peer Review
Alignment
0.88
cosine vs the emphasis profile (≥0.80)
Breadth
6 · Comprehensive
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Management, IS & marketing — alignment against management, is & marketing’s own critique standard 0.78 (pooled 0.88)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
attends to the expert-emphasised lenses (alignment 0.88); broader than a typical Comment (6 dimensions vs human median 5).
✓ calibratedOpen-access full textCRIT-000031
Large Language Models, Small Labor Market Effects
Alignment
0.87
cosine vs the emphasis profile (≥0.80)
Breadth
5 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Economics & finance — alignment against economics & finance’s own critique standard 0.73 (pooled 0.87)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
attends to the expert-emphasised lenses (alignment 0.87); breadth typical of the genre (5 dimensions).
✓ calibratedOpen-access full textCRIT-000024
Effect of AI empathy perception on employees' prosocial behavior: mediating role of warmth and moderating role of AI anthropomorphism
Alignment
0.85
cosine vs the emphasis profile (≥0.80)
Breadth
5 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Management, IS & marketing — alignment against management, is & marketing’s own critique standard 0.78 (pooled 0.85)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
attends to the expert-emphasised lenses (alignment 0.85); breadth typical of the genre (5 dimensions).
✓ calibratedOpen-access full textCRIT-000037
Mental health in the “era” of artificial intelligence: technostress and the perceived impact on anxiety and depressive disorders—an SEM analysis
Alignment
0.85
cosine vs the emphasis profile (≥0.80)
Breadth
5 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Psychology — alignment against psychology’s own critique standard 0.71 (pooled 0.85)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
attends to the expert-emphasised lenses (alignment 0.85); breadth typical of the genre (5 dimensions).
✓ calibratedOpen-access full textCRIT-000033
Investigating the impact of social media images on users' sentiments towards sociopolitical events based on deep artificial intelligence
Alignment
0.85
cosine vs the emphasis profile (≥0.80)
Breadth
5 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Communication & media — alignment against communication & media’s own critique standard 0.84 (pooled 0.85)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
attends to the expert-emphasised lenses (alignment 0.85); breadth typical of the genre (5 dimensions).
✓ calibratedUser-supplied full textCRIT-GEN-being-literate-behaving-
Being literate, behaving literate? A mixed-methods approach to adolescents’ algorithm literacy and behavioral strategies on social media
Alignment
0.84
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Communication & media — alignment against communication & media’s own critique standard 0.88 (pooled 0.84)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
attends to the expert-emphasised lenses (alignment 0.84); breadth typical of the genre (4 dimensions).
✓ calibratedAbstract onlyCRIT-GEN-resilience-and-disempowe
Resilience and disempowerment in algorithmic systems
Alignment
0.84
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Communication & media — alignment against communication & media’s own critique standard 0.88 (pooled 0.84)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
attends to the expert-emphasised lenses (alignment 0.84); breadth typical of the genre (4 dimensions).
✓ calibratedAbstract onlyCRIT-000009
The politics of artificial intelligence alignment: Public reactions to AI moderation in the case of Google’s Gemini
Alignment
0.84
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Political science — alignment against political science’s own critique standard 0.74 (pooled 0.84)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
attends to the expert-emphasised lenses (alignment 0.84); breadth typical of the genre (4 dimensions).
✓ calibratedOpen-access full textCRIT-000028
The (Short-Term) Effects of Large Language Models on Unemployment and Earnings
Alignment
0.82
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Economics & finance — alignment against economics & finance’s own critique standard 0.87 (pooled 0.82)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
attends to the expert-emphasised lenses (alignment 0.82); breadth typical of the genre (4 dimensions).
✓ calibratedOpen-access full textCRIT-000026
Fairness Is More Than Algorithms: Racial Disparities in Time-to-Recidivism
Alignment
0.82
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Public policy & criminology — alignment against public policy & criminology’s own critique standard 0.81 (pooled 0.82)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
attends to the expert-emphasised lenses (alignment 0.82); breadth typical of the genre (4 dimensions).
✓ calibratedOpen-access full textCRIT-000044
On the conversational persuasiveness of GPT-4
Alignment
0.81
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Psychology — alignment against psychology’s own critique standard 0.76 (pooled 0.81)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
attends to the expert-emphasised lenses (alignment 0.81); breadth typical of the genre (4 dimensions).
✓ calibratedAbstract onlyCRIT-000012
Generative AI, propaganda, and digital authoritarianism: Comparative insights from six democratically weakened countries
Alignment
0.80
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Communication & media — alignment against communication & media’s own critique standard 0.88 (pooled 0.80)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
attends to the expert-emphasised lenses (alignment 0.80); breadth typical of the genre (4 dimensions).
✓ calibratedAbstract onlyCRIT-000010
Refusal as silence: Gendered disparities in Vision-Language Model responses
Alignment
0.80
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Sociology — alignment against sociology’s own critique standard 0.79 (pooled 0.80)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
attends to the expert-emphasised lenses (alignment 0.80); breadth typical of the genre (4 dimensions).
needs reviewUser-supplied full textCRIT-GEN-the-rise-of-ai-sovereign
The rise of AI sovereignty: Authoritarian technological imaginaries as a form of reflexive control
Alignment
0.80
cosine vs the emphasis profile (≥0.80)
Breadth
6 · Comprehensive
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Political science — alignment against political science’s own critique standard 0.72 (pooled 0.80)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.80 < 0.8); broader than a typical Comment (6 dimensions vs human median 5).
needs reviewAbstract onlyCRIT-GEN-scp-artificial-intellige
Artificial intelligence adoption and the demand for managerial expertise
Alignment
0.79
cosine vs the emphasis profile (≥0.80)
Breadth
5 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Management, IS & marketing — alignment against management, is & marketing’s own critique standard 0.62 (pooled 0.79)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.79 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (5 dimensions).
needs reviewOpen-access full textCRIT-000035
Artificial intelligence and social media as new arenas of political competition: challenges for democracy
Alignment
0.78
cosine vs the emphasis profile (≥0.80)
Breadth
5 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Political science — alignment against political science’s own critique standard 0.82 (pooled 0.78)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.78 < 0.8); breadth typical of the genre (5 dimensions).
needs reviewAbstract onlyCRIT-GEN-how-costs-influence-pref
How Costs Influence Preferences for Control in Generative Artificial Intelligence (GenAI): Human-Guided vs. GenAI-Based Delegated Search
Alignment
0.78
cosine vs the emphasis profile (≥0.80)
Breadth
6 · Comprehensive
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Management, IS & marketing — alignment against management, is & marketing’s own critique standard 0.64 (pooled 0.78)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.78 < 0.8); full-text review needed to reach the calibrated standard; broader than a typical Comment (6 dimensions vs human median 5).
needs reviewAbstract onlyCRIT-000016
Cultural bias and cultural alignment of large language models
Alignment
0.77
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Sociology — alignment against sociology’s own critique standard 0.75 (pooled 0.77)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.77 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (4 dimensions).
needs reviewUser-supplied full textCRIT-000003
The Cybernetic Teammate: A Field Experiment on Generative AI and Teamwork
Alignment
0.77
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Management, IS & marketing — alignment against management, is & marketing’s own critique standard 0.43 (pooled 0.77)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.77 < 0.8); breadth typical of the genre (4 dimensions).
needs reviewOpen-access full textCRIT-000029
Factors influencing the adoption of generative artificial intelligence into classroom teaching by university teachers: An empirical study using SPSS PROCESS macros
Alignment
0.77
cosine vs the emphasis profile (≥0.80)
Breadth
5 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Education — alignment against education’s own critique standard 0.69 (pooled 0.77)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.77 < 0.8); breadth typical of the genre (5 dimensions).
needs reviewOpen-access full textCRIT-000034
Exploring the acceptance of ChatGPT in higher education: a comprehensive quantitative study of university students and faculty
Alignment
0.76
cosine vs the emphasis profile (≥0.80)
Breadth
5 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Education — alignment against education’s own critique standard 0.69 (pooled 0.76)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.76 < 0.8); breadth typical of the genre (5 dimensions).
needs reviewLicensed full textCRIT-000048
Real-time artificial intelligence sentiment feedback promotes self-moderation in contentious online discussion
Alignment
0.75
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Communication & media — alignment against communication & media’s own critique standard 0.67 (pooled 0.75)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.75 < 0.8); breadth typical of the genre (4 dimensions).
needs reviewAbstract onlyCRIT-000018
Is it harmful or helpful? Examining the causes and consequences of generative AI usage among university students
Alignment
0.75
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Education — alignment against education’s own critique standard 0.77 (pooled 0.75)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.75 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (4 dimensions).
needs reviewUser-supplied full textCRIT-000008
AI meets politics: Examining the effects of different targeting strategies across 15 countries
Alignment
0.74
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Political science — alignment against political science’s own critique standard 0.62 (pooled 0.74)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.74 < 0.8); breadth typical of the genre (4 dimensions).
needs reviewLicensed full textCRIT-000049
Multimodal large language models can make context-sensitive hate speech evaluations aligned with human judgement
Alignment
0.74
cosine vs the emphasis profile (≥0.80)
Breadth
3 · Focused
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.74 < 0.8); more focused than a typical critique (3 dimensions vs human median 5).
needs reviewAbstract onlyCRIT-000019
Student perspectives on the use of generative artificial intelligence technologies in higher education
Alignment
0.74
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Education — alignment against education’s own critique standard 0.77 (pooled 0.74)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.74 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (4 dimensions).
needs reviewOpen-access full textCRIT-000032
AI-determined similarity increases likability and trustworthiness of human voices
Alignment
0.73
cosine vs the emphasis profile (≥0.80)
Breadth
5 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Psychology — alignment against psychology’s own critique standard 0.59 (pooled 0.73)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.73 < 0.8); breadth typical of the genre (5 dimensions).
needs reviewAbstract onlyCRIT-GEN-algorithmic-responsibili
Algorithmic responsibility in PPC practice: Interpreting black boxes in digital advertising work
Alignment
0.73
cosine vs the emphasis profile (≥0.80)
Breadth
5 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Sociology — alignment against sociology’s own critique standard 0.75 (pooled 0.73)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.73 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (5 dimensions).
needs reviewAbstract onlyCRIT-GEN-from-prompt-engineering-
From prompt engineering to prompt design: Research strategies for visual generative AI
Alignment
0.73
cosine vs the emphasis profile (≥0.80)
Breadth
5 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Communication & media — alignment against communication & media’s own critique standard 0.84 (pooled 0.73)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.73 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (5 dimensions).
needs reviewUser-supplied full textCRIT-GEN-making-genai-valuable-be
Making GenAI valuable: Benchmarks, singularities, and the enrichment economy
Alignment
0.73
cosine vs the emphasis profile (≥0.80)
Breadth
5 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Sociology — alignment against sociology’s own critique standard 0.75 (pooled 0.73)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.73 < 0.8); breadth typical of the genre (5 dimensions).
needs reviewOpen-access full textCRIT-000038
Whether and When Could Generative AI Improve College Student Learning Engagement?
Alignment
0.73
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Education — alignment against education’s own critique standard 0.42 (pooled 0.73)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.73 < 0.8); breadth typical of the genre (4 dimensions).
needs reviewOpen-access full textCRIT-000027
Heterogeneous preferences and asymmetric insights for AI use among welfare claimants and non-claimants
Alignment
0.71
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Public policy & criminology — alignment against public policy & criminology’s own critique standard 0.72 (pooled 0.71)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.71 < 0.8); breadth typical of the genre (4 dimensions).
needs reviewOpen-access full textCRIT-000039
AI in education through the learners’ eyes: practical experience, perceptions, and challenges
Alignment
0.69
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Education — alignment against education’s own critique standard 0.63 (pooled 0.69)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.69 < 0.8); breadth typical of the genre (4 dimensions).
needs reviewOpen-access full textCRIT-000030
AI tutoring outperforms in-class active learning: an RCT introducing a novel research-based design in an authentic educational setting
Alignment
0.69
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Education — alignment against education’s own critique standard 0.77 (pooled 0.69)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.69 < 0.8); breadth typical of the genre (4 dimensions).
needs reviewOpen-access full textCRIT-000020
Inconsistent advice by ChatGPT influences decision making in various areas
Alignment
0.69
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Psychology — alignment against psychology’s own critique standard 0.56 (pooled 0.69)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.69 < 0.8); breadth typical of the genre (4 dimensions).
needs reviewOpen-access full textCRIT-000041
Political ideology shapes support for the use of AI in policy-making
Alignment
0.69
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Political science — alignment against political science’s own critique standard 0.71 (pooled 0.69)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.69 < 0.8); breadth typical of the genre (4 dimensions).
needs reviewAbstract onlyCRIT-000015
Testing theory of mind in large language models and humans
Alignment
0.69
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Psychology — alignment against psychology’s own critique standard 0.56 (pooled 0.69)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.69 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (4 dimensions).
needs reviewUser-supplied full textCRIT-GEN-beyond-disruption-and-in
Beyond disruption and invisibility: Interactional continuity in everyday AI use in India
Alignment
0.69
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Communication & media — alignment against communication & media’s own critique standard 0.81 (pooled 0.69)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.69 < 0.8); breadth typical of the genre (4 dimensions).
needs reviewAbstract onlyCRIT-GEN-crafting-computer-vision
Crafting computer vision through human eyes: An AI laboratory ethnography
Alignment
0.69
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Sociology — alignment against sociology’s own critique standard 0.70 (pooled 0.69)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.69 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (4 dimensions).
needs reviewUser-supplied full textCRIT-GEN-into-the-black-box-laype
Into the black box: Laypeople's folk theories about generative artificial intelligence chatbots
Alignment
0.69
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Communication & media — alignment against communication & media’s own critique standard 0.81 (pooled 0.69)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.69 < 0.8); breadth typical of the genre (4 dimensions).
needs reviewAbstract onlyCRIT-GEN-working-the-algorithm-co
Working the algorithm: Contextual skills of on-demand gig workers
Alignment
0.69
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Sociology — alignment against sociology’s own critique standard 0.70 (pooled 0.69)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.69 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (4 dimensions).
needs reviewOpen-access full textCRIT-000040
Understanding support for AI regulation: A Bayesian network perspective
Alignment
0.68
cosine vs the emphasis profile (≥0.80)
Breadth
3 · Focused
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Political science — alignment against political science’s own critique standard 0.68 (pooled 0.68)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.68 < 0.8); more focused than a typical critique (3 dimensions vs human median 5).
needs reviewOpen-access full textCRIT-000023
AI Self-preferencing in Algorithmic Hiring: Empirical Evidence and Insights
Alignment
0.68
cosine vs the emphasis profile (≥0.80)
Breadth
3 · Focused
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Economics & finance — alignment against economics & finance’s own critique standard 0.59 (pooled 0.68)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.68 < 0.8); more focused than a typical critique (3 dimensions vs human median 5).
needs reviewOpen-access full textCRIT-000025
Local US officials' views on the impacts and governance of AI: Evidence from 2022 and 2023 survey waves
Alignment
0.65
cosine vs the emphasis profile (≥0.80)
Breadth
3 · Focused
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Political science — alignment against political science’s own critique standard 0.65 (pooled 0.65)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.65 < 0.8); more focused than a typical critique (3 dimensions vs human median 5).
needs reviewOpen-access full textCRIT-000021
Postgraduate students' perceptions of artificial intelligence integration in research: A cross-sectional study
Alignment
0.65
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Education — alignment against education’s own critique standard 0.49 (pooled 0.65)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.65 < 0.8); breadth typical of the genre (4 dimensions).
needs reviewOpen-access full textCRIT-000042
Positioning Political Texts with Large Language Models by Asking and Averaging
Alignment
0.64
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Political science — alignment against political science’s own critique standard 0.71 (pooled 0.64)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.64 < 0.8); breadth typical of the genre (4 dimensions).
needs reviewOpen-access full textCRIT-000043
How human–AI feedback loops alter human perceptual, emotional and social judgements
Alignment
0.62
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Psychology — alignment against psychology’s own critique standard 0.52 (pooled 0.62)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.62 < 0.8); breadth typical of the genre (4 dimensions).
needs reviewAbstract onlyCRIT-000017
Generative AI enhances individual creativity but reduces the collective diversity of novel content
Alignment
0.61
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Psychology — alignment against psychology’s own critique standard 0.42 (pooled 0.61)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.61 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (4 dimensions).
needs reviewAbstract onlyCRIT-GEN-backfiring-ai-ai-deploym
Backfiring AI? AI Deployment in Workplace
Alignment
0.60
cosine vs the emphasis profile (≥0.80)
Breadth
6 · Comprehensive
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Management, IS & marketing — alignment against management, is & marketing’s own critique standard 0.28 (pooled 0.60)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.60 < 0.8); full-text review needed to reach the calibrated standard; broader than a typical Comment (6 dimensions vs human median 5).
needs reviewOpen-access full textCRIT-000022
When an AI Judges Your Work: The Hidden Costs of Algorithmic Assessment
Alignment
0.56
cosine vs the emphasis profile (≥0.80)
Breadth
3 · Focused
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Psychology — alignment against psychology’s own critique standard 0.36 (pooled 0.56)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.56 < 0.8); more focused than a typical critique (3 dimensions vs human median 5).
needs reviewAbstract onlyCRIT-000006
Can ChatGPT Kill User-Generated Q&A Platforms?
Alignment
0.56
cosine vs the emphasis profile (≥0.80)
Breadth
3 · Focused
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Management, IS & marketing — alignment against management, is & marketing’s own critique standard 0.50 (pooled 0.56)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.56 < 0.8); full-text review needed to reach the calibrated standard; more focused than a typical critique (3 dimensions vs human median 5).
needs reviewAbstract onlyCRIT-000004
Artificial Collusion: Examining Supracompetitive Pricing by Q-Learning Algorithms
Alignment
0.56
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Economics & finance — alignment against economics & finance’s own critique standard 0.36 (pooled 0.56)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.56 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (4 dimensions).
needs reviewUser-supplied full textCRIT-000011
From rule of law to rule of algorithm: Generative Artificial Intelligence's threat to democracy
Alignment
0.56
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Political science — alignment against political science’s own critique standard 0.47 (pooled 0.56)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.56 < 0.8); breadth typical of the genre (4 dimensions).
needs reviewAbstract onlyCRIT-000005
Unraveling Generative AI from a Human Intelligence Perspective: A Battery of Experiments
Alignment
0.56
cosine vs the emphasis profile (≥0.80)
Breadth
4 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Psychology — alignment against psychology’s own critique standard 0.42 (pooled 0.56)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.56 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (4 dimensions).
needs reviewLicensed full textCRIT-000046
Comparing the value of perceived human versus AI-generated empathy
Alignment
0.53
cosine vs the emphasis profile (≥0.80)
Breadth
3 · Focused
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Psychology — alignment against psychology’s own critique standard 0.32 (pooled 0.53)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.53 < 0.8); more focused than a typical critique (3 dimensions vs human median 5).
needs reviewAbstract onlyCRIT-GEN-charismatic-machines-on-
Charismatic machines: On the epistemic power of generative AI within platform convergence
Alignment
0.51
cosine vs the emphasis profile (≥0.80)
Breadth
5 · Typical
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Communication & media — alignment against communication & media’s own critique standard 0.60 (pooled 0.51)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.51 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (5 dimensions).
needs reviewLicensed full textCRIT-000047
Reducing political polarization through conversations with artificial intelligence
Alignment
0.50
cosine vs the emphasis profile (≥0.80)
Breadth
3 · Focused
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Political science — alignment against political science’s own critique standard 0.44 (pooled 0.50)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.50 < 0.8); more focused than a typical critique (3 dimensions vs human median 5).
needs reviewAbstract onlyCRIT-000007
Made With AI: Consumer Engagement with Social Media Containing AI Disclosures
Alignment
0.49
cosine vs the emphasis profile (≥0.80)
Breadth
3 · Focused
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Management, IS & marketing — alignment against management, is & marketing’s own critique standard 0.20 (pooled 0.49)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.49 < 0.8); full-text review needed to reach the calibrated standard; more focused than a typical critique (3 dimensions vs human median 5).
needs reviewLicensed full textCRIT-000045
Cultural tendencies in generative AI
Alignment
0.41
cosine vs the emphasis profile (≥0.80)
Breadth
3 · Focused
vs human median 5
Discipline
pass
sourced · severity-capped · no motive
Grounding
100%
claims citing specific evidence
Field: Psychology — alignment against psychology’s own critique standard 0.20 (pooled 0.41)
Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution
under-attends to the expert-emphasised lenses (alignment 0.41 < 0.8); more focused than a typical critique (3 dimensions vs human median 5).

What this measures — and what it doesn’t

Calibration measures whether a critique looks like a member of the expert-critique distribution: it attends to the lenses experts emphasise (alignment), its breadth is in the human range, it passes the same credibility gates the benchmarks embody (sourced, severity-capped to its access basis, claims-not-motives), and every claim cites specific evidence. It does notcertify that any individual judgement is correct — only an expert reading the target paper can do that, which is why critiques remain author-contestable and the severity caps stay conservative. A “comprehensive” breadth is not a defect: Critical AI critiques tend toward a full referee-style appraisal rather than a single-issue Comment.

The reference distribution re-derives from the benchmark corpus; add a benchmark and the standard moves. Machine-readable at /critique/api/calibration.

Critical AI against the human standard

reference: 68 human critiques63 Critical AI critiques scored · 17 calibratedmean alignment 0.72 (threshold 0.80)JSON ↗

The human standard

Methods / design

91%

Claim–evidence fit

91%

Statistics / inference

72%

Reproducibility

57%

Overclaiming

46%

Generalisation

44%

Causal identification

37%

Data & code

25%

Theory / framing

16%

Novelty / contribution

The standard differs by field

Domain	n	Most-emphasised lenses
Economics & finance	14	Methods / design 93% · Claim–evidence fit 93% · Statistics / inference 86% · Reproducibility 71%
Political science	9	Statistics / inference 89% · Methods / design 78% · Claim–evidence fit 78% · Reproducibility 78%
Psychology	7	Methods / design 100% · Statistics / inference 100% · Claim–evidence fit 100% · Reproducibility 100%
Sociology	6	Methods / design 100% · Claim–evidence fit 100% · Statistics / inference 67% · Reproducibility 67%
Public policy & criminology	6	Methods / design 100% · Claim–evidence fit 100% · Statistics / inference 67% · Overclaiming 50%
Communication & media	4	Methods / design 100% · Claim–evidence fit 100% · Generalisation 75% · Statistics / inference 50%
Education	4	Claim–evidence fit 100% · Overclaiming 75% · Generalisation 75% · Methods / design 50%
Management, IS & marketing	3	Methods / design 100% · Causal identification 100% · Statistics / inference 100% · Reproducibility 67%

Critical AI critiques, scored

✓ calibratedOpen-access full textCRIT-000013

The Impact of AI on Developer Productivity: Evidence from GitHub Copilot

Alignment

0.94

cosine vs the emphasis profile (≥0.80)

Breadth

7 · Comprehensive

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Economics & finance — alignment against economics & finance’s own critique standard 0.87 (pooled 0.94)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

attends to the expert-emphasised lenses (alignment 0.94); broader than a typical Comment (7 dimensions vs human median 5).

✓ calibratedOpen-access full textCRIT-000002

Generative AI at Work

Alignment

0.92

cosine vs the emphasis profile (≥0.80)

Breadth

8 · Comprehensive

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Economics & finance — alignment against economics & finance’s own critique standard 0.90 (pooled 0.92)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

attends to the expert-emphasised lenses (alignment 0.92); broader than a typical Comment (8 dimensions vs human median 5).

✓ calibratedOpen-access full textCRIT-000014

Scaffolding Human–AI Collaboration: A Field Experiment on Behavioral Protocols and Cognitive Reframing

Alignment

0.91

cosine vs the emphasis profile (≥0.80)

Breadth

6 · Comprehensive

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Management, IS & marketing — alignment against management, is & marketing’s own critique standard 0.92 (pooled 0.91)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

attends to the expert-emphasised lenses (alignment 0.91); broader than a typical Comment (6 dimensions vs human median 5).

✓ calibratedAbstract onlyCRIT-GEN-when-influencers-delegat

When Influencers Delegate Replies: How Social AI Agents Shape User Engagement

Alignment

0.91

cosine vs the emphasis profile (≥0.80)

Breadth

6 · Comprehensive

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Management, IS & marketing — alignment against management, is & marketing’s own critique standard 0.92 (pooled 0.91)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

attends to the expert-emphasised lenses (alignment 0.91); broader than a typical Comment (6 dimensions vs human median 5).

✓ calibratedUser-supplied full textCRIT-GEN-more-versus-better-artif

More Versus Better: Artificial Intelligence, Incentives, and the Emerging Crisis in Peer Review

Alignment

0.88

cosine vs the emphasis profile (≥0.80)

Breadth

6 · Comprehensive

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Management, IS & marketing — alignment against management, is & marketing’s own critique standard 0.78 (pooled 0.88)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

attends to the expert-emphasised lenses (alignment 0.88); broader than a typical Comment (6 dimensions vs human median 5).

✓ calibratedOpen-access full textCRIT-000031

Large Language Models, Small Labor Market Effects

Alignment

0.87

cosine vs the emphasis profile (≥0.80)

Breadth

5 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Economics & finance — alignment against economics & finance’s own critique standard 0.73 (pooled 0.87)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

attends to the expert-emphasised lenses (alignment 0.87); breadth typical of the genre (5 dimensions).

✓ calibratedOpen-access full textCRIT-000024

Effect of AI empathy perception on employees' prosocial behavior: mediating role of warmth and moderating role of AI anthropomorphism

Alignment

0.85

cosine vs the emphasis profile (≥0.80)

Breadth

5 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Management, IS & marketing — alignment against management, is & marketing’s own critique standard 0.78 (pooled 0.85)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

attends to the expert-emphasised lenses (alignment 0.85); breadth typical of the genre (5 dimensions).

✓ calibratedOpen-access full textCRIT-000037

Mental health in the “era” of artificial intelligence: technostress and the perceived impact on anxiety and depressive disorders—an SEM analysis

Alignment

0.85

cosine vs the emphasis profile (≥0.80)

Breadth

5 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Psychology — alignment against psychology’s own critique standard 0.71 (pooled 0.85)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

attends to the expert-emphasised lenses (alignment 0.85); breadth typical of the genre (5 dimensions).

✓ calibratedOpen-access full textCRIT-000033

Investigating the impact of social media images on users' sentiments towards sociopolitical events based on deep artificial intelligence

Alignment

0.85

cosine vs the emphasis profile (≥0.80)

Breadth

5 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Communication & media — alignment against communication & media’s own critique standard 0.84 (pooled 0.85)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

attends to the expert-emphasised lenses (alignment 0.85); breadth typical of the genre (5 dimensions).

✓ calibratedUser-supplied full textCRIT-GEN-being-literate-behaving-

Being literate, behaving literate? A mixed-methods approach to adolescents’ algorithm literacy and behavioral strategies on social media

Alignment

0.84

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Communication & media — alignment against communication & media’s own critique standard 0.88 (pooled 0.84)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

attends to the expert-emphasised lenses (alignment 0.84); breadth typical of the genre (4 dimensions).

✓ calibratedAbstract onlyCRIT-GEN-resilience-and-disempowe

Resilience and disempowerment in algorithmic systems

Alignment

0.84

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Communication & media — alignment against communication & media’s own critique standard 0.88 (pooled 0.84)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

attends to the expert-emphasised lenses (alignment 0.84); breadth typical of the genre (4 dimensions).

✓ calibratedAbstract onlyCRIT-000009

The politics of artificial intelligence alignment: Public reactions to AI moderation in the case of Google’s Gemini

Alignment

0.84

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Political science — alignment against political science’s own critique standard 0.74 (pooled 0.84)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

attends to the expert-emphasised lenses (alignment 0.84); breadth typical of the genre (4 dimensions).

✓ calibratedOpen-access full textCRIT-000028

The (Short-Term) Effects of Large Language Models on Unemployment and Earnings

Alignment

0.82

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Economics & finance — alignment against economics & finance’s own critique standard 0.87 (pooled 0.82)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

attends to the expert-emphasised lenses (alignment 0.82); breadth typical of the genre (4 dimensions).

✓ calibratedOpen-access full textCRIT-000026

Fairness Is More Than Algorithms: Racial Disparities in Time-to-Recidivism

Alignment

0.82

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Public policy & criminology — alignment against public policy & criminology’s own critique standard 0.81 (pooled 0.82)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

attends to the expert-emphasised lenses (alignment 0.82); breadth typical of the genre (4 dimensions).

✓ calibratedOpen-access full textCRIT-000044

On the conversational persuasiveness of GPT-4

Alignment

0.81

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Psychology — alignment against psychology’s own critique standard 0.76 (pooled 0.81)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

attends to the expert-emphasised lenses (alignment 0.81); breadth typical of the genre (4 dimensions).

✓ calibratedAbstract onlyCRIT-000012

Generative AI, propaganda, and digital authoritarianism: Comparative insights from six democratically weakened countries

Alignment

0.80

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Communication & media — alignment against communication & media’s own critique standard 0.88 (pooled 0.80)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

attends to the expert-emphasised lenses (alignment 0.80); breadth typical of the genre (4 dimensions).

✓ calibratedAbstract onlyCRIT-000010

Refusal as silence: Gendered disparities in Vision-Language Model responses

Alignment

0.80

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Sociology — alignment against sociology’s own critique standard 0.79 (pooled 0.80)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

attends to the expert-emphasised lenses (alignment 0.80); breadth typical of the genre (4 dimensions).

needs reviewUser-supplied full textCRIT-GEN-the-rise-of-ai-sovereign

The rise of AI sovereignty: Authoritarian technological imaginaries as a form of reflexive control

Alignment

0.80

cosine vs the emphasis profile (≥0.80)

Breadth

6 · Comprehensive

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Political science — alignment against political science’s own critique standard 0.72 (pooled 0.80)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.80 < 0.8); broader than a typical Comment (6 dimensions vs human median 5).

needs reviewAbstract onlyCRIT-GEN-scp-artificial-intellige

Artificial intelligence adoption and the demand for managerial expertise

Alignment

0.79

cosine vs the emphasis profile (≥0.80)

Breadth

5 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Management, IS & marketing — alignment against management, is & marketing’s own critique standard 0.62 (pooled 0.79)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.79 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (5 dimensions).

needs reviewOpen-access full textCRIT-000035

Artificial intelligence and social media as new arenas of political competition: challenges for democracy

Alignment

0.78

cosine vs the emphasis profile (≥0.80)

Breadth

5 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Political science — alignment against political science’s own critique standard 0.82 (pooled 0.78)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.78 < 0.8); breadth typical of the genre (5 dimensions).

needs reviewAbstract onlyCRIT-GEN-how-costs-influence-pref

How Costs Influence Preferences for Control in Generative Artificial Intelligence (GenAI): Human-Guided vs. GenAI-Based Delegated Search

Alignment

0.78

cosine vs the emphasis profile (≥0.80)

Breadth

6 · Comprehensive

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Management, IS & marketing — alignment against management, is & marketing’s own critique standard 0.64 (pooled 0.78)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.78 < 0.8); full-text review needed to reach the calibrated standard; broader than a typical Comment (6 dimensions vs human median 5).

needs reviewAbstract onlyCRIT-000016

Cultural bias and cultural alignment of large language models

Alignment

0.77

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Sociology — alignment against sociology’s own critique standard 0.75 (pooled 0.77)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.77 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (4 dimensions).

needs reviewUser-supplied full textCRIT-000003

The Cybernetic Teammate: A Field Experiment on Generative AI and Teamwork

Alignment

0.77

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Management, IS & marketing — alignment against management, is & marketing’s own critique standard 0.43 (pooled 0.77)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.77 < 0.8); breadth typical of the genre (4 dimensions).

needs reviewOpen-access full textCRIT-000029

Factors influencing the adoption of generative artificial intelligence into classroom teaching by university teachers: An empirical study using SPSS PROCESS macros

Alignment

0.77

cosine vs the emphasis profile (≥0.80)

Breadth

5 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Education — alignment against education’s own critique standard 0.69 (pooled 0.77)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.77 < 0.8); breadth typical of the genre (5 dimensions).

needs reviewOpen-access full textCRIT-000034

Exploring the acceptance of ChatGPT in higher education: a comprehensive quantitative study of university students and faculty

Alignment

0.76

cosine vs the emphasis profile (≥0.80)

Breadth

5 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Education — alignment against education’s own critique standard 0.69 (pooled 0.76)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.76 < 0.8); breadth typical of the genre (5 dimensions).

needs reviewLicensed full textCRIT-000048

Real-time artificial intelligence sentiment feedback promotes self-moderation in contentious online discussion

Alignment

0.75

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Communication & media — alignment against communication & media’s own critique standard 0.67 (pooled 0.75)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.75 < 0.8); breadth typical of the genre (4 dimensions).

needs reviewAbstract onlyCRIT-000018

Is it harmful or helpful? Examining the causes and consequences of generative AI usage among university students

Alignment

0.75

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Education — alignment against education’s own critique standard 0.77 (pooled 0.75)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.75 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (4 dimensions).

needs reviewUser-supplied full textCRIT-000008

AI meets politics: Examining the effects of different targeting strategies across 15 countries

Alignment

0.74

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Political science — alignment against political science’s own critique standard 0.62 (pooled 0.74)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.74 < 0.8); breadth typical of the genre (4 dimensions).

needs reviewLicensed full textCRIT-000049

Multimodal large language models can make context-sensitive hate speech evaluations aligned with human judgement

Alignment

0.74

cosine vs the emphasis profile (≥0.80)

Breadth

3 · Focused

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.74 < 0.8); more focused than a typical critique (3 dimensions vs human median 5).

needs reviewAbstract onlyCRIT-000019

Student perspectives on the use of generative artificial intelligence technologies in higher education

Alignment

0.74

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Education — alignment against education’s own critique standard 0.77 (pooled 0.74)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.74 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (4 dimensions).

needs reviewOpen-access full textCRIT-000032

AI-determined similarity increases likability and trustworthiness of human voices

Alignment

0.73

cosine vs the emphasis profile (≥0.80)

Breadth

5 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Psychology — alignment against psychology’s own critique standard 0.59 (pooled 0.73)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.73 < 0.8); breadth typical of the genre (5 dimensions).

needs reviewAbstract onlyCRIT-GEN-algorithmic-responsibili

Algorithmic responsibility in PPC practice: Interpreting black boxes in digital advertising work

Alignment

0.73

cosine vs the emphasis profile (≥0.80)

Breadth

5 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Sociology — alignment against sociology’s own critique standard 0.75 (pooled 0.73)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.73 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (5 dimensions).

needs reviewAbstract onlyCRIT-GEN-from-prompt-engineering-

From prompt engineering to prompt design: Research strategies for visual generative AI

Alignment

0.73

cosine vs the emphasis profile (≥0.80)

Breadth

5 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Communication & media — alignment against communication & media’s own critique standard 0.84 (pooled 0.73)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

needs reviewUser-supplied full textCRIT-GEN-making-genai-valuable-be

Making GenAI valuable: Benchmarks, singularities, and the enrichment economy

Alignment

0.73

cosine vs the emphasis profile (≥0.80)

Breadth

5 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Sociology — alignment against sociology’s own critique standard 0.75 (pooled 0.73)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.73 < 0.8); breadth typical of the genre (5 dimensions).

needs reviewOpen-access full textCRIT-000038

Whether and When Could Generative AI Improve College Student Learning Engagement?

Alignment

0.73

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Education — alignment against education’s own critique standard 0.42 (pooled 0.73)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.73 < 0.8); breadth typical of the genre (4 dimensions).

needs reviewOpen-access full textCRIT-000027

Heterogeneous preferences and asymmetric insights for AI use among welfare claimants and non-claimants

Alignment

0.71

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Public policy & criminology — alignment against public policy & criminology’s own critique standard 0.72 (pooled 0.71)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.71 < 0.8); breadth typical of the genre (4 dimensions).

needs reviewOpen-access full textCRIT-000039

AI in education through the learners’ eyes: practical experience, perceptions, and challenges

Alignment

0.69

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Education — alignment against education’s own critique standard 0.63 (pooled 0.69)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.69 < 0.8); breadth typical of the genre (4 dimensions).

needs reviewOpen-access full textCRIT-000030

AI tutoring outperforms in-class active learning: an RCT introducing a novel research-based design in an authentic educational setting

Alignment

0.69

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Education — alignment against education’s own critique standard 0.77 (pooled 0.69)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.69 < 0.8); breadth typical of the genre (4 dimensions).

needs reviewOpen-access full textCRIT-000020

Inconsistent advice by ChatGPT influences decision making in various areas

Alignment

0.69

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Psychology — alignment against psychology’s own critique standard 0.56 (pooled 0.69)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.69 < 0.8); breadth typical of the genre (4 dimensions).

needs reviewOpen-access full textCRIT-000041

Political ideology shapes support for the use of AI in policy-making

Alignment

0.69

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Political science — alignment against political science’s own critique standard 0.71 (pooled 0.69)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.69 < 0.8); breadth typical of the genre (4 dimensions).

needs reviewAbstract onlyCRIT-000015

Testing theory of mind in large language models and humans

Alignment

0.69

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Psychology — alignment against psychology’s own critique standard 0.56 (pooled 0.69)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.69 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (4 dimensions).

needs reviewUser-supplied full textCRIT-GEN-beyond-disruption-and-in

Beyond disruption and invisibility: Interactional continuity in everyday AI use in India

Alignment

0.69

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Communication & media — alignment against communication & media’s own critique standard 0.81 (pooled 0.69)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.69 < 0.8); breadth typical of the genre (4 dimensions).

needs reviewAbstract onlyCRIT-GEN-crafting-computer-vision

Crafting computer vision through human eyes: An AI laboratory ethnography

Alignment

0.69

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Sociology — alignment against sociology’s own critique standard 0.70 (pooled 0.69)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

needs reviewUser-supplied full textCRIT-GEN-into-the-black-box-laype

Into the black box: Laypeople's folk theories about generative artificial intelligence chatbots

Alignment

0.69

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Communication & media — alignment against communication & media’s own critique standard 0.81 (pooled 0.69)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.69 < 0.8); breadth typical of the genre (4 dimensions).

needs reviewAbstract onlyCRIT-GEN-working-the-algorithm-co

Working the algorithm: Contextual skills of on-demand gig workers

Alignment

0.69

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Sociology — alignment against sociology’s own critique standard 0.70 (pooled 0.69)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

needs reviewOpen-access full textCRIT-000040

Understanding support for AI regulation: A Bayesian network perspective

Alignment

0.68

cosine vs the emphasis profile (≥0.80)

Breadth

3 · Focused

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Political science — alignment against political science’s own critique standard 0.68 (pooled 0.68)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.68 < 0.8); more focused than a typical critique (3 dimensions vs human median 5).

needs reviewOpen-access full textCRIT-000023

AI Self-preferencing in Algorithmic Hiring: Empirical Evidence and Insights

Alignment

0.68

cosine vs the emphasis profile (≥0.80)

Breadth

3 · Focused

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Economics & finance — alignment against economics & finance’s own critique standard 0.59 (pooled 0.68)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.68 < 0.8); more focused than a typical critique (3 dimensions vs human median 5).

needs reviewOpen-access full textCRIT-000025

Local US officials' views on the impacts and governance of AI: Evidence from 2022 and 2023 survey waves

Alignment

0.65

cosine vs the emphasis profile (≥0.80)

Breadth

3 · Focused

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Political science — alignment against political science’s own critique standard 0.65 (pooled 0.65)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.65 < 0.8); more focused than a typical critique (3 dimensions vs human median 5).

needs reviewOpen-access full textCRIT-000021

Postgraduate students' perceptions of artificial intelligence integration in research: A cross-sectional study

Alignment

0.65

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Education — alignment against education’s own critique standard 0.49 (pooled 0.65)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.65 < 0.8); breadth typical of the genre (4 dimensions).

needs reviewOpen-access full textCRIT-000042

Positioning Political Texts with Large Language Models by Asking and Averaging

Alignment

0.64

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Political science — alignment against political science’s own critique standard 0.71 (pooled 0.64)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.64 < 0.8); breadth typical of the genre (4 dimensions).

needs reviewOpen-access full textCRIT-000043

How human–AI feedback loops alter human perceptual, emotional and social judgements

Alignment

0.62

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Psychology — alignment against psychology’s own critique standard 0.52 (pooled 0.62)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.62 < 0.8); breadth typical of the genre (4 dimensions).

needs reviewAbstract onlyCRIT-000017

Generative AI enhances individual creativity but reduces the collective diversity of novel content

Alignment

0.61

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Psychology — alignment against psychology’s own critique standard 0.42 (pooled 0.61)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.61 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (4 dimensions).

needs reviewAbstract onlyCRIT-GEN-backfiring-ai-ai-deploym

Backfiring AI? AI Deployment in Workplace

Alignment

0.60

cosine vs the emphasis profile (≥0.80)

Breadth

6 · Comprehensive

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Management, IS & marketing — alignment against management, is & marketing’s own critique standard 0.28 (pooled 0.60)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.60 < 0.8); full-text review needed to reach the calibrated standard; broader than a typical Comment (6 dimensions vs human median 5).

needs reviewOpen-access full textCRIT-000022

When an AI Judges Your Work: The Hidden Costs of Algorithmic Assessment

Alignment

0.56

cosine vs the emphasis profile (≥0.80)

Breadth

3 · Focused

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Psychology — alignment against psychology’s own critique standard 0.36 (pooled 0.56)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.56 < 0.8); more focused than a typical critique (3 dimensions vs human median 5).

needs reviewAbstract onlyCRIT-000006

Can ChatGPT Kill User-Generated Q&A Platforms?

Alignment

0.56

cosine vs the emphasis profile (≥0.80)

Breadth

3 · Focused

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Management, IS & marketing — alignment against management, is & marketing’s own critique standard 0.50 (pooled 0.56)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.56 < 0.8); full-text review needed to reach the calibrated standard; more focused than a typical critique (3 dimensions vs human median 5).

needs reviewAbstract onlyCRIT-000004

Artificial Collusion: Examining Supracompetitive Pricing by Q-Learning Algorithms

Alignment

0.56

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Economics & finance — alignment against economics & finance’s own critique standard 0.36 (pooled 0.56)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

needs reviewUser-supplied full textCRIT-000011

From rule of law to rule of algorithm: Generative Artificial Intelligence's threat to democracy

Alignment

0.56

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Political science — alignment against political science’s own critique standard 0.47 (pooled 0.56)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.56 < 0.8); breadth typical of the genre (4 dimensions).

needs reviewAbstract onlyCRIT-000005

Unraveling Generative AI from a Human Intelligence Perspective: A Battery of Experiments

Alignment

0.56

cosine vs the emphasis profile (≥0.80)

Breadth

4 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Psychology — alignment against psychology’s own critique standard 0.42 (pooled 0.56)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

needs reviewLicensed full textCRIT-000046

Comparing the value of perceived human versus AI-generated empathy

Alignment

0.53

cosine vs the emphasis profile (≥0.80)

Breadth

3 · Focused

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Psychology — alignment against psychology’s own critique standard 0.32 (pooled 0.53)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.53 < 0.8); more focused than a typical critique (3 dimensions vs human median 5).

needs reviewAbstract onlyCRIT-GEN-charismatic-machines-on-

Charismatic machines: On the epistemic power of generative AI within platform convergence

Alignment

0.51

cosine vs the emphasis profile (≥0.80)

Breadth

5 · Typical

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Communication & media — alignment against communication & media’s own critique standard 0.60 (pooled 0.51)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.51 < 0.8); full-text review needed to reach the calibrated standard; breadth typical of the genre (5 dimensions).

needs reviewLicensed full textCRIT-000047

Reducing political polarization through conversations with artificial intelligence

Alignment

0.50

cosine vs the emphasis profile (≥0.80)

Breadth

3 · Focused

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Political science — alignment against political science’s own critique standard 0.44 (pooled 0.50)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.50 < 0.8); more focused than a typical critique (3 dimensions vs human median 5).

needs reviewAbstract onlyCRIT-000007

Made With AI: Consumer Engagement with Social Media Containing AI Disclosures

Alignment

0.49

cosine vs the emphasis profile (≥0.80)

Breadth

3 · Focused

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Management, IS & marketing — alignment against management, is & marketing’s own critique standard 0.20 (pooled 0.49)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

scope-limited by abstract-only access — cannot reach the methods/statistics/identification lenses experts emphasise (alignment 0.49 < 0.8); full-text review needed to reach the calibrated standard; more focused than a typical critique (3 dimensions vs human median 5).

needs reviewLicensed full textCRIT-000045

Cultural tendencies in generative AI

Alignment

0.41

cosine vs the emphasis profile (≥0.80)

Breadth

3 · Focused

vs human median 5

Discipline

pass

sourced · severity-capped · no motive

Grounding

100%

claims citing specific evidence

Field: Psychology — alignment against psychology’s own critique standard 0.20 (pooled 0.41)

Methods / designCausal identificationStatistics / inferenceData & codeClaim–evidence fitReproducibilityOverclaimingGeneralisationTheory / framingNovelty / contribution

under-attends to the expert-emphasised lenses (alignment 0.41 < 0.8); more focused than a typical critique (3 dimensions vs human median 5).

What this measures — and what it doesn’t

The reference distribution re-derives from the benchmark corpus; add a benchmark and the standard moves. Machine-readable at /critique/api/calibration.