{"$schema":"https://policywindow.org/critique/api/schema","critique_id":"CRIT-000033","slug":"social-media-images-sociopolitical-sentiment","url":"https://policywindow.org/critique/c/social-media-images-sociopolitical-sentiment","doi":null,"status":"published","critique_type":"editorially_approved_ai_native_critique","publication_date":"2026-07-01","current_version":"1.0","target_paper":{"title":"Investigating the impact of social media images on users' sentiments towards sociopolitical events based on deep artificial intelligence","authors":["Nafiseh Jabbari Tofighi","Reda Alhajj"],"journal":"PLOS ONE","doi":"10.1371/journal.pone.0326936","url":"https://doi.org/10.1371/journal.pone.0326936","publicationDate":"2025-07-30","paperType":"empirical","accessBasis":"open_access","fullTextUsed":true,"fictional":false,"doi_url":"https://doi.org/10.1371/journal.pone.0326936"},"source_journal":{"tier":"exception","rankingSources":["resolved from the monitored-venue determination"],"rankingNote":"Off-monitored: PLOS ONE is a peer-reviewed, gold open-access (CC BY) megajournal not in the journal's monitored top-tier list; critiqued from its verbatim open-access full text."},"selection_provenance":{"id":"social-media-images-sociopolitical-sentiment","venue":"PLOS ONE","inMonitoredSet":false,"determinedTier":null,"recordedTier":"exception","effectiveTier":"exception","kind":"off_list","disclosed":true,"offListPeerReviewed":true},"selection":{"aiAgiCentralityScore":3,"societalRelevanceScore":3,"aiAgiCategories":[],"selectionReason":"Autonomous production cycle (political_science deepening); OA full-text critique via the G119-improved producer + 3-lens convergence gate."},"scores":{"aiAgiContribution":3,"evidentiarySupport":2,"methodologicalRisk":4,"overclaiming":4,"reproducibilityOrAuditability":2,"societalImpactRelevance":3,"severity":"high","confidence":"high"},"severity_cap_for_access_basis":"high","plain_language_summary":"The paper scrapes 100 Instagram posts (20 per movement: Black Lives Matter, Women's March, Climate Change Protests, Anti-war), hand-labels each post's image as positive (100) or negative (0), computes each post's percentage of positive BERT-classified comments, and correlates the two per movement. It reports Pearson (PLCC) and Spearman (SROCC) coefficients of 0.531-0.723 and concludes that images \"strongly influence\" and \"shape\" public sentiment toward sociopolitical events. The core problem is a gap between the design (a contemporaneous, same-post cross-sectional correlation on only 20 posts per movement, with no significance test anywhere) and the claim (images causally shape public opinion). Three descriptive choices compound this: posts were deliberately selected to span both sentiments, which mechanically affects the correlation; the reported coefficients carry no p-value, confidence interval, or standard error despite n=20; and the image measure is binary while the comment measure is continuous, so the \"Pearson linear correlation\" is really a point-biserial coefficient. The authors deserve credit for sharing code and data on GitHub and for openly disclosing manual-labeling subjectivity, binary-coding oversimplification, single-platform scope, and the possible presence of synthetic images.","claims":[{"id":"CLAIM-001","text":"The design is a single cross-sectional correlation between an image's manual label and the SAME post's comment-positivity percentage, with no temporal ordering, no manipulation, and no control for confounders. Yet the paper's headline conclusion frames the result causally: images","type":"empirical","evidenceOffered":"visual content strongly influence how social media users react to sociopolitical events","support":"weak","overclaiming":"major","assessment":"Section 4.3 describes only that 'PLCC was used to determine the linear correlation between the sentiment scores of images and the positive sentiment percentages of the comments' - a purely associational, contemporaneous measure. No experiment, no lagged or temporal design, and no confounder adjustment appears anywhere in the methodology. The abstract itself uses the hedged term 'correlation,' which contradicts the conclusion's causal 'strongly influence,' exposing the identification gap; a refuter cannot cite any design feature that establishes direction because none exists in the text.","mainWeakness":"The design is a single cross-sectional correlation between an image's manual label and the SAME post's comment-positivity percentage, with no temporal ordering, no manipulation, and no control for confounders. Yet the paper's headline conclusion frames the result causally: images","confidence":"high"},{"id":"CLAIM-002","text":"Every correlation is computed on only 20 posts per movement, yet the paper reports no significance test, p-value, confidence interval, or standard error anywhere, while calling the association 'significant.' At n=20 the reported coefficients carry wide uncertainty and the five pe","type":"empirical","evidenceOffered":"indicating the significant role of visual content in shaping the public opinion","support":"weak","overclaiming":"major","assessment":"Section 3.3 states 'we extracted a total of 100 posts, with 20 posts selected per sociopolitical movement,' and the correlations in Table 3 are per-movement, so each rests on n=20. A full-text search finds no p-value, confidence interval, standard error, or hypothesis test anywhere in the paper; the word 'significant' is used only as informal emphasis, never backed by a test statistic.","mainWeakness":"Every correlation is computed on only 20 posts per movement, yet the paper reports no significance test, p-value, confidence interval, or standard error anywhere, while calling the association 'significant.' At n=20 the reported coefficients carry wide uncertainty and the five pe","confidence":"high"},{"id":"CLAIM-003","text":"Posts were deliberately chosen to span both positive and negative sentiment 'to ensure balanced data distribution' - i.e., selected partly on the very sentiment dimension being correlated. Purposively engineering variation across the sentiment range alters a correlation coefficie","type":"empirical","evidenceOffered":"posts were selected to include both positive and negative sentiment variations to ensure balanced data distribution","support":"moderate","overclaiming":"moderate","assessment":"Section 3.3's 'Post Selection Process' lists this criterion alongside 'High Engagement-posts with the most likes, comments, and shares were prioritized.' Both are non-random selection rules; selecting to balance the sentiment dimension is a recognized source of correlation distortion, and the paper offers no re-weighting or random-sampling comparison to bound its effect.","mainWeakness":"Posts were deliberately chosen to span both positive and negative sentiment 'to ensure balanced data distribution' - i.e., selected partly on the very sentiment dimension being correlated. Purposively engineering variation across the sentiment range alters a correlation coefficie","confidence":"high"},{"id":"CLAIM-004","text":"The image variable is binary (positive=100, negative=0) while the comment variable is a continuous 0-100% positivity score. A Pearson 'linear correlation coefficient' between a binary and a continuous variable is a point-biserial correlation whose magnitude is mechanically bounde","type":"empirical","evidenceOffered":"PLCC was used to determine the linear correlation between the sentiment scores of images and the positive sentiment percentages of the comments","support":"moderate","overclaiming":"moderate","assessment":"Section 4.2 states 'Positive images were assigned a value of 100, and negative images were assigned a value of 0. This binary classification enabled a straightforward quantification of the sentiment associated with visual content.' A Pearson correlation with a dichotomous input is by definition point-biserial; the disclosed limitation about binary coding concerns only sentiment nuance, not the point-biserial scaling issue, so this measurement point is not author-addressed.","mainWeakness":"The image variable is binary (positive=100, negative=0) while the comment variable is a continuous 0-100% positivity score. A Pearson 'linear correlation coefficient' between a binary and a continuous variable is a point-biserial correlation whose magnitude is mechanically bounde","confidence":"high"}],"sections":[],"strongest_critique":"The paper's headline conclusion - that social media images \"strongly influence how social media users react to sociopolitical events\" - asserts causation that its design cannot support. The entire evidence base is a set of per-movement Pearson/Spearman correlations between an image's manual sentiment label and the SAME post's comment-positivity percentage. This is purely contemporaneous and same-post: there is no temporal ordering, no manipulation, and no confounder control anywhere in the methodology (Section 4.3 describes only that \"PLCC was used to determine the linear correlation between the sentiment scores of images and the positive sentiment percentages of the comments\"). Because the image and the comments both respond to the same underlying event at the same moment, the correlation is equally consistent with reverse causation or with a shared common cause (the event's valence driving both the chosen image and the comment tone). The abstract's own hedged noun \"correlation\" contradicts the conclusion's causal verb \"strongly influence,\" and no design feature exists in the text to close that gap - so the causal claim is unearned regardless of how large the coefficients are.","strongest_fair_defence":"The paper is genuinely commendable on transparency and disclosure, and a fair critique must concede this. The authors provide a public GitHub repository with both code and dataset (\"Code and dataset's download link is available here\"), which materially aids reproducibility and lets a reader recover model details not given in the prose. They openly disclose several of their own limitations: the manual image-labeling process \"can introduce subjectivity\"; the \"binary classification of images as either positive or negative may oversimplify the complexity of visual content\"; the single-platform (Instagram-only) scope; and, notably, that \"there may exist AI generated or synthetic images in the dataset\" along with retrospective-collection recency-bias risk. The abstract and much of the body also hedge with the accurate term \"correlation\" rather than causation. Because of these disclosures, criticisms aimed at manual-labeling subjectivity, binary-coding coarseness, single-platform generalizability, or reproducibility of the classifier are largely pre-empted by the authors themselves and should not be treated as undisclosed failings - the reproducibility/IMDb-domain-shift concern in particular is substantially mitigated by the shared repository. The defensible targets are the specific over-claims the paper does NOT concede: the causal framing of a contemporaneous same-post correlation, the absence of any significance testing at n=20, selection of posts on the outcome dimension, and the point-biserial scaling of a binary-vs-continuous coefficient.","final_judgment":"The paper reports a real, moderate-to-large descriptive association between hand-labeled image sentiment and comment positivity across four movements, and it is admirably transparent about its data, code, and several limitations. But its headline claim over-reaches on causal identification: a contemporaneous, same-post cross-sectional correlation cannot establish that images \"strongly influence\" user reactions when image and comments both respond to the same event, and the paper isolates no direction of effect. This is the single most defensible, hardest-to-refute flaw and it is not author-disclosed. Three secondary, span-grounded gaps compound it without being bundled into the headline: no significance test, p-value, or confidence interval is reported despite each coefficient resting on only n=20 posts (so the movement ranking is asserted, not shown); posts were selected partly on the sentiment dimension being correlated, distorting the coefficient relative to a random sample; and the \"linear correlation\" between a binary image score and a continuous comment percentage is really a point-biserial coefficient whose magnitude and cross-movement comparability are constrained by each movement's label split. The overall defect is a bounded but clear over-claim - causal and inferential language outrunning an associational, small, purposively selected design - on a paper whose finding of an association is otherwise real and whose limitations are partly disclosed.","review_process":{"aiAgentsUsed":["produce","sharpen","gate_refute","gate_defender","gate_neutral"],"reviewRounds":1,"humanEditor":{"name":"","role":"","approvalDate":"","declaredConflict":"none"},"expertCertification":{"used":false}},"author_response":{"notified":false,"status":"not_yet_invited"},"versions":[{"version":"1.0","date":"2026-07-01","note":"","changeType":"initial"}],"transparency":{"modelCardUrl":"/critique/model-card","publicAuditSummary":"Critique produced by the autonomous production cycle (G119 producer → 3-lens convergence gate); staged pending the operator-gated promotion (--promote) which runs the full automated integrity gate. No human editor.","privateAuditRecordExists":true,"citationVerification":{"status":"complete","checkedSources":[],"fabricatedCitations":0},"riskReview":{"copyright":"completed","defamation":"completed","note":"PLOS ONE (gold open access, CC BY 4.0) quoted sparingly under criticism/review; critique targets claims, methods and inference only — never the authors."}}}