Comment on "Investigating the impact of social media images on users' sentiments towards sociopolitical events based on deep artificial intelligence"

Item: Investigating the impact of social media images on users' sentiments towards sociopolitical events based on deep artificial intelligence
Author: Critical AI

Critical AI

Post-publication Comment · Critical AI

Comment on “Investigating the impact of social media images on users' sentiments towards sociopolitical events based on deep artificial intelligence”

Critical AI · published 2026-07-01 · v1.0 · CRIT-000033

Concerning: Nafiseh Jabbari Tofighi, Reda Alhajj · PLOS ONE · 2025-07-30

Severity: HighConfidence: HighTier exceptionOff-list venue · peer-reviewedOpen-access full textEmpiricalRead the paper ↗

Why this paper was selected

Autonomous production cycle (political_science deepening); OA full-text critique via the G119-improved producer + 3-lens convergence gate.

AI/AGI centrality 3/5 · societal relevance 3/5 · source-journal note: Off-monitored: PLOS ONE is a peer-reviewed, gold open-access (CC BY) megajournal not in the journal's monitored top-tier list; critiqued from its verbatim open-access full text.

Summary

The paper scrapes 100 Instagram posts (20 per movement: Black Lives Matter, Women's March, Climate Change Protests, Anti-war), hand-labels each post's image as positive (100) or negative (0), computes each post's percentage of positive BERT-classified comments, and correlates the two per movement. It reports Pearson (PLCC) and Spearman (SROCC) coefficients of 0.531-0.723 and concludes that images "strongly influence" and "shape" public sentiment toward sociopolitical events. The core problem is a gap between the design (a contemporaneous, same-post cross-sectional correlation on only 20 posts per movement, with no significance test anywhere) and the claim (images causally shape public opinion). Three descriptive choices compound this: posts were deliberately selected to span both sentiments, which mechanically affects the correlation; the reported coefficients carry no p-value, confidence interval, or standard error despite n=20; and the image measure is binary while the comment measure is continuous, so the "Pearson linear correlation" is really a point-biserial coefficient. The authors deserve credit for sharing code and data on GitHub and for openly disclosing manual-labeling subjectivity, binary-coding oversimplification, single-platform scope, and the possible presence of synthetic images.

Central claims & evidence map

Claim	Evidence offered	Support	Overclaiming	Main weakness
The design is a single cross-sectional correlation between an image's manual label and the SAME post's comment-positivity percentage, with no temporal ordering, no manipulation, and no control for confounders. Yet the paper's headline conclusion frames the result causally: images	visual content strongly influence how social media users react to sociopolitical events	Weak	Major	The design is a single cross-sectional correlation between an image's manual label and the SAME post's comment-positivity percentage, with no temporal ordering, no manipulation, and no control for confounders. Yet the paper's headline conclusion frames the result causally: images
Every correlation is computed on only 20 posts per movement, yet the paper reports no significance test, p-value, confidence interval, or standard error anywhere, while calling the association 'significant.' At n=20 the reported coefficients carry wide uncertainty and the five pe	indicating the significant role of visual content in shaping the public opinion	Weak	Major	Every correlation is computed on only 20 posts per movement, yet the paper reports no significance test, p-value, confidence interval, or standard error anywhere, while calling the association 'significant.' At n=20 the reported coefficients carry wide uncertainty and the five pe
Posts were deliberately chosen to span both positive and negative sentiment 'to ensure balanced data distribution' - i.e., selected partly on the very sentiment dimension being correlated. Purposively engineering variation across the sentiment range alters a correlation coefficie	posts were selected to include both positive and negative sentiment variations to ensure balanced data distribution	Moderate	Moderate	Posts were deliberately chosen to span both positive and negative sentiment 'to ensure balanced data distribution' - i.e., selected partly on the very sentiment dimension being correlated. Purposively engineering variation across the sentiment range alters a correlation coefficie
The image variable is binary (positive=100, negative=0) while the comment variable is a continuous 0-100% positivity score. A Pearson 'linear correlation coefficient' between a binary and a continuous variable is a point-biserial correlation whose magnitude is mechanically bounde	PLCC was used to determine the linear correlation between the sentiment scores of images and the positive sentiment percentages of the comments	Moderate	Moderate	The image variable is binary (positive=100, negative=0) while the comment variable is a continuous 0-100% positivity score. A Pearson 'linear correlation coefficient' between a binary and a continuous variable is a point-biserial correlation whose magnitude is mechanically bounde

Per-claim assessment

CLAIM-001. The design is a single cross-sectional correlation between an image's manual label and the SAME post's comment-positivity percentage, with no temporal ordering, no manipulation, and no control for confounders. Yet the paper's headline conclusion frames the result causally: images
Section 4.3 describes only that 'PLCC was used to determine the linear correlation between the sentiment scores of images and the positive sentiment percentages of the comments' - a purely associational, contemporaneous measure. No experiment, no lagged or temporal design, and no confounder adjustment appears anywhere in the methodology. The abstract itself uses the hedged term 'correlation,' which contradicts the conclusion's causal 'strongly influence,' exposing the identification gap; a refuter cannot cite any design feature that establishes direction because none exists in the text.
CLAIM-002. Every correlation is computed on only 20 posts per movement, yet the paper reports no significance test, p-value, confidence interval, or standard error anywhere, while calling the association 'significant.' At n=20 the reported coefficients carry wide uncertainty and the five pe
Section 3.3 states 'we extracted a total of 100 posts, with 20 posts selected per sociopolitical movement,' and the correlations in Table 3 are per-movement, so each rests on n=20. A full-text search finds no p-value, confidence interval, standard error, or hypothesis test anywhere in the paper; the word 'significant' is used only as informal emphasis, never backed by a test statistic.
CLAIM-003. Posts were deliberately chosen to span both positive and negative sentiment 'to ensure balanced data distribution' - i.e., selected partly on the very sentiment dimension being correlated. Purposively engineering variation across the sentiment range alters a correlation coefficie
Section 3.3's 'Post Selection Process' lists this criterion alongside 'High Engagement-posts with the most likes, comments, and shares were prioritized.' Both are non-random selection rules; selecting to balance the sentiment dimension is a recognized source of correlation distortion, and the paper offers no re-weighting or random-sampling comparison to bound its effect.
CLAIM-004. The image variable is binary (positive=100, negative=0) while the comment variable is a continuous 0-100% positivity score. A Pearson 'linear correlation coefficient' between a binary and a continuous variable is a point-biserial correlation whose magnitude is mechanically bounde
Section 4.2 states 'Positive images were assigned a value of 100, and negative images were assigned a value of 0. This binary classification enabled a straightforward quantification of the sentiment associated with visual content.' A Pearson correlation with a dichotomous input is by definition point-biserial; the disclosed limitation about binary coding concerns only sentiment nuance, not the point-biserial scaling issue, so this measurement point is not author-addressed.

Scorecard

AI/AGI contribution3.0 / 5

Evidentiary support2.0 / 5

Methodological risk4.0 / 5

Overclaiming4.0 / 5

Reproducibility / auditability2.0 / 5

Societal-impact relevance3.0 / 5

Sub-scores are 0–5 editorial judgements on fixed scales (higher is better, except methodological risk and overclaiming where higher is worse). They are contestable and open to a severity challenge from authors.

Strongest critique

The paper's headline conclusion - that social media images "strongly influence how social media users react to sociopolitical events" - asserts causation that its design cannot support. The entire evidence base is a set of per-movement Pearson/Spearman correlations between an image's manual sentiment label and the SAME post's comment-positivity percentage. This is purely contemporaneous and same-post: there is no temporal ordering, no manipulation, and no confounder control anywhere in the methodology (Section 4.3 describes only that "PLCC was used to determine the linear correlation between the sentiment scores of images and the positive sentiment percentages of the comments"). Because the image and the comments both respond to the same underlying event at the same moment, the correlation is equally consistent with reverse causation or with a shared common cause (the event's valence driving both the chosen image and the comment tone). The abstract's own hedged noun "correlation" contradicts the conclusion's causal verb "strongly influence," and no design feature exists in the text to close that gap - so the causal claim is unearned regardless of how large the coefficients are.

Strongest fair defence

The paper is genuinely commendable on transparency and disclosure, and a fair critique must concede this. The authors provide a public GitHub repository with both code and dataset ("Code and dataset's download link is available here"), which materially aids reproducibility and lets a reader recover model details not given in the prose. They openly disclose several of their own limitations: the manual image-labeling process "can introduce subjectivity"; the "binary classification of images as either positive or negative may oversimplify the complexity of visual content"; the single-platform (Instagram-only) scope; and, notably, that "there may exist AI generated or synthetic images in the dataset" along with retrospective-collection recency-bias risk. The abstract and much of the body also hedge with the accurate term "correlation" rather than causation. Because of these disclosures, criticisms aimed at manual-labeling subjectivity, binary-coding coarseness, single-platform generalizability, or reproducibility of the classifier are largely pre-empted by the authors themselves and should not be treated as undisclosed failings - the reproducibility/IMDb-domain-shift concern in particular is substantially mitigated by the shared repository. The defensible targets are the specific over-claims the paper does NOT concede: the causal framing of a contemporaneous same-post correlation, the absence of any significance testing at n=20, selection of posts on the outcome dimension, and the point-biserial scaling of a binary-vs-continuous coefficient.

Conclusion

The paper reports a real, moderate-to-large descriptive association between hand-labeled image sentiment and comment positivity across four movements, and it is admirably transparent about its data, code, and several limitations. But its headline claim over-reaches on causal identification: a contemporaneous, same-post cross-sectional correlation cannot establish that images "strongly influence" user reactions when image and comments both respond to the same event, and the paper isolates no direction of effect. This is the single most defensible, hardest-to-refute flaw and it is not author-disclosed. Three secondary, span-grounded gaps compound it without being bundled into the headline: no significance test, p-value, or confidence interval is reported despite each coefficient resting on only n=20 posts (so the movement ranking is asserted, not shown); posts were selected partly on the sentiment dimension being correlated, distorting the coefficient relative to a random sample; and the "linear correlation" between a binary image score and a continuous comment percentage is really a point-biserial coefficient whose magnitude and cross-movement comparability are constrained by each movement's label split. The overall defect is a bounded but clear over-claim - causal and inferential language outrunning an associational, small, purposively selected design - on a paper whose finding of an association is otherwise real and whose limitations are partly disclosed.

Reply from the authors

Following the practice of Nature Matters Arising, Science Technical Comments and PNAS Letters, this Comment is published as one half of a Comment + Reply pair: the authors of the original article are invited to respond, and any reply is published here verbatim alongside the Comment as part of the record.

Reply: not yet invited. No reply has been received for publication.

The authors have a right of reply and no veto. A reply may request a factual correction, a methodological rebuttal, a clarification, a data/code update, or a severity challenge, and is published unedited. See the right-of-reply policy.

Source-grounding attestation

✓ attested in-appgrounding: spans in app

✓Verbatim source spans present in the critique — 4/4 provenance spans re-derived in the critique prose
✓Passes the publication validator — no errors
✓Zero fabricated citations — 0 fabricated
✓Severity within the access-basis cap — severity "high" ≤ cap "high" for open_access

Every verbatim span the critique relies on is re-derived in the prose in-app; span-in-source is re-verifiable offline (the abstract is re-fetched, not stored, per the no-reproduce policy).

Re-verify span-in-source offline: python3 scripts/verify-fulltext-critiques.py

Version & correction history

Version	Date	Change
v1.0	2026-07-01

No silent substantive corrections — every change is versioned and visible.

How to cite this Comment

Critical AI. Comment on “Investigating the impact of social media images on users' sentiments towards sociopolitical events based on deep artificial intelligence” (Nafiseh Jabbari Tofighi et al., PLOS ONE, 2025). Critical AI; 2026. https://policywindow.org/critique/c/social-media-images-sociopolitical-sentiment

A registered DOI will replace the URL once minted; until then the canonical URL is the persistent identifier. Highwire/Dublin-Core citation tags and a schema.org Review record are embedded in this page for Google Scholar and reference managers.

Verify this Comment. Its checkable facts (target DOI, access-basis severity cap, zero fabricated citations) are served — as the app’s self-report — at /critique/api/critiques/social-media-images-sociopolitical-sentiment/verify; to confirm them independently of this site, re-derive the same checks (and resolve the target DOI) with npx tsx scripts/verify-critical-ai.ts --critique social-media-images-sociopolitical-sentiment --live.

Content fingerprint c6d14d5eb9250945 (v1.0) — this Comment’s substantive content is content-addressed; a silent post-publication edit would change it.