{"$schema":"https://policywindow.org/critique/api/schema","name":"Critical AI — critique benchmarks","description":"Real, published human-expert critiques (Comments, Replies, replications, critical commentaries, reanalyses) in top-tier social-science journals. Every DOI is independently Crossref-verified. Used as calibration benchmarks for Critical AI's AI-native critiques.","docs":"https://policywindow.org/critique/benchmarks","coverage":{"total":39,"aiRelated":4,"venues":28,"fields":33,"dimensions":10},"count":39,"benchmarks":[{"id":"herndon-ash-pollin-reinhart-rogoff","title":"Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff","authors":["Thomas Herndon","Michael Ash","Robert Pollin"],"venue":"Cambridge Journal of Economics","tier":"A","year":"2013","doi":"10.1093/cje/bet075","critiqueType":"replication","target":{"title":"Growth in a Time of Debt","authors":["Carmen M. Reinhart","Kenneth S. Rogoff"],"doi":"10.1257/aer.100.2.573","year":""},"whatItChallenges":"Replicating Reinhart and Rogoff's claim that public debt above 90% of GDP is associated with sharply lower growth, the authors find a spreadsheet coding error, selective exclusion of available country-year data, and unconventional weighting. Corrected, average real growth for high-debt countries is +2.2%, not the published -0.1%, eliminating the supposed debt threshold.","dimensions":["methods","data_code","reproducibility","statistics","claims","overclaiming"],"aiRelated":false,"field":"Macroeconomics / Public Finance","verifyNote":"DOI resolves in Crossref to this exact title in Cambridge Journal of Economics (2013), indexed as journal-article.","doi_url":"https://doi.org/10.1093/cje/bet075","target_doi_url":"https://doi.org/10.1257/aer.100.2.573"},{"id":"foote-goetz-abortion-crime","title":"The Impact of Legalized Abortion on Crime: Comment","authors":["Christopher L. Foote","Christopher F. Goetz"],"venue":"Quarterly Journal of Economics","tier":"S","year":"2008","doi":"10.1162/qjec.2008.123.1.407","critiqueType":"comment","target":{"title":"The Impact of Legalized Abortion on Crime","authors":["John J. Donohue III","Steven D. Levitt"],"doi":"10.1162/00335530151144050","year":""},"whatItChallenges":"The comment identifies a coding mistake in the within-state cohort regressions of Donohue and Levitt's abortion-crime paper and shows that correcting it and using a per-capita crime specification sharply weakens the results. It also shows the cross-state tests are not robust to allowing differential state trends.","dimensions":["methods","identification","data_code","statistics","claims","reproducibility"],"aiRelated":false,"field":"Applied Microeconomics / Crime","verifyNote":"DOI resolves in Crossref to this exact title in Quarterly Journal of Economics (2008), indexed as journal-article.","doi_url":"https://doi.org/10.1162/qjec.2008.123.1.407","target_doi_url":"https://doi.org/10.1162/00335530151144050"},{"id":"albouy-colonial-origins","title":"The Colonial Origins of Comparative Development: An Empirical Investigation: Comment","authors":["David Y. Albouy"],"venue":"American Economic Review","tier":"S","year":"2012","doi":"10.1257/aer.102.6.3059","critiqueType":"comment","target":{"title":"The Colonial Origins of Comparative Development: An Empirical Investigation","authors":["Daron Acemoglu","Simon Johnson","James A. Robinson"],"doi":"10.1257/aer.91.5.1369","year":""},"whatItChallenges":"Albouy shows that 36 of the 64 countries are assigned settler-mortality rates borrowed from other countries and that incomparable rates from laborers, bishops, and soldiers on campaign are combined in ways favorable to the institutions hypothesis. Once data problems are addressed, the mortality-expropriation relationship and the instrumental-variable estimates lose robustness, often yielding effectively infinite confidence intervals.","dimensions":["data_code","identification","methods","statistics","claims","generalisation"],"aiRelated":false,"field":"Development Economics / Institutions","verifyNote":"DOI resolves in Crossref to this exact title in American Economic Review (2012), indexed as journal-article.","doi_url":"https://doi.org/10.1257/aer.102.6.3059","target_doi_url":"https://doi.org/10.1257/aer.91.5.1369"},{"id":"gilbert-king-reproducibility-psychology","title":"Comment on \"Estimating the reproducibility of psychological science\"","authors":["Daniel T. Gilbert","Gary King","Stephen Pettigrew","Timothy D. Wilson"],"venue":"Science","tier":"S","year":"2016","doi":"10.1126/science.aad7243","critiqueType":"comment","target":{"title":"Estimating the reproducibility of psychological science","authors":["Open Science Collaboration"],"doi":"10.1126/science.aac4716","year":""},"whatItChallenges":"Argues the Open Science Collaboration's Reproducibility Project contains three statistical errors (low-power replications, non-representative study sampling, and misleading endorsement criteria) that bias the reported replication rate downward. Concludes the data are actually consistent with very high reproducibility, not the low rate the original claimed.","dimensions":["statistics","methods","identification","claims","reproducibility","generalisation"],"aiRelated":false,"field":"Psychology / metascience","verifyNote":"DOI resolves in Crossref to this exact title in Science (2016), indexed as journal-article.","doi_url":"https://doi.org/10.1126/science.aad7243","target_doi_url":"https://doi.org/10.1126/science.aac4716"},{"id":"hagger-ego-depletion-rrr","title":"A Multilab Preregistered Replication of the Ego-Depletion Effect","authors":["Martin S. Hagger","Nikos L. D. Chatzisarantis","Hugo Alberts","Calvin Octavianus Anggono"],"venue":"Perspectives on Psychological Science","tier":"A","year":"2016","doi":"10.1177/1745691616652873","critiqueType":"replication","target":{"title":"Methylphenidate Blocks Effort-Induced Depletion of Regulatory Control in Healthy Volunteers","authors":["Chandra Sripada","Daniel Kessler","John Jonides"],"doi":"10.1177/0956797614526415","year":""},"whatItChallenges":"A preregistered Registered Replication Report across 23 labs (~2000 participants) tested the sequential-task ego-depletion effect and found a meta-analytic effect indistinguishable from zero (d ≈ 0.04). Challenges the existence and robustness of the widely cited ego-depletion / limited-resource model of self-control.","dimensions":["reproducibility","methods","statistics","claims","generalisation"],"aiRelated":false,"field":"Social/cognitive psychology (self-control)","verifyNote":"DOI resolves in Crossref to this exact title in Perspectives on Psychological Science (2016), indexed as journal-article.","doi_url":"https://doi.org/10.1177/1745691616652873","target_doi_url":"https://doi.org/10.1177/0956797614526415"},{"id":"wagenmakers-facial-feedback-rrr","title":"Registered Replication Report: Strack, Martin, & Stepper (1988)","authors":["E.-J. Wagenmakers","Titia Beek","Laura Dijkhoff","Quentin F. Gronau"],"venue":"Perspectives on Psychological Science","tier":"A","year":"2016","doi":"10.1177/1745691616674458","critiqueType":"replication","target":{"title":"Inhibiting and facilitating conditions of the human smile: A nonobtrusive test of the facial feedback hypothesis","authors":["Fritz Strack","Leonard L. Martin","Sabine Stepper"],"doi":"10.1037/0022-3514.54.5.768","year":""},"whatItChallenges":"A Registered Replication Report of 17 direct replications of the classic pen-in-mouth facial-feedback study found a pooled effect of 0.03 rating units (95% CI -0.11 to 0.16) versus the original 0.82, failing to replicate the claim that induced smiling increases rated funniness. Challenges a textbook facial-feedback finding.","dimensions":["reproducibility","methods","statistics","claims","theory"],"aiRelated":false,"field":"Social psychology / emotion","verifyNote":"DOI resolves in Crossref to this exact title in Perspectives on Psychological Science (2016), indexed as journal-article.","doi_url":"https://doi.org/10.1177/1745691616674458","target_doi_url":"https://doi.org/10.1037/0022-3514.54.5.768"},{"id":"ranehill-power-posing","title":"Assessing the Robustness of Power Posing: No Effect on Hormones and Risk Tolerance in a Large Sample of Men and Women","authors":["Eva Ranehill","Anna Dreber","Magnus Johannesson","Susanne Leiberg","Sunhae Sul","Roberto A. Weber"],"venue":"Psychological Science","tier":"S","year":"2015","doi":"10.1177/0956797614553946","critiqueType":"replication","target":{"title":"Power Posing: Brief Nonverbal Displays Affect Neuroendocrine Levels and Risk Tolerance","authors":["Dana R. Carney","Amy J. C. Cuddy","Andy J. Yap"],"doi":"10.1177/0956797610383437","year":""},"whatItChallenges":"A larger, better-powered replication (N=200) of the power-posing study replicated only self-reported feelings of power but found no effect of expansive postures on testosterone, cortisol, or behavioral risk tolerance. Challenges the central physiological and behavioral claims of the original power-posing paper.","dimensions":["reproducibility","methods","statistics","claims","overclaiming"],"aiRelated":false,"field":"Social psychology / embodied cognition","verifyNote":"DOI resolves in Crossref to this exact title in Psychological Science (2015), indexed as journal-article.","doi_url":"https://doi.org/10.1177/0956797614553946","target_doi_url":"https://doi.org/10.1177/0956797610383437"},{"id":"klein-many-labs-2","title":"Many Labs 2: Investigating Variation in Replicability Across Samples and Settings","authors":["Richard A. Klein","Michelangelo Vianello","Fred Hasselman"],"venue":"Advances in Methods and Practices in Psychological Science","tier":"A","year":"2018","doi":"10.1177/2515245918810225","critiqueType":"replication","target":{"title":"28 classic and contemporary psychological findings (multi-target replication, e.g. Tversky & Kahneman framing, Schwarz heuristics, moral-judgment effects)","authors":["Various original authors"],"doi":null,"year":""},"whatItChallenges":"A large preregistered multi-site project replicated 28 published effects across 60+ samples and ~15,000 participants; only about half replicated robustly and variation across samples/settings was generally small, implying non-replication reflects original effects rather than hidden moderators. Challenges the robustness and breadth of numerous canonical findings.","dimensions":["reproducibility","methods","statistics","generalisation","claims"],"aiRelated":false,"field":"Psychology / metascience","verifyNote":"DOI resolves in Crossref to this exact title in Advances in Methods and Practices in Psychological Science (2018), indexed as journal-article.","doi_url":"https://doi.org/10.1177/2515245918810225","target_doi_url":null},{"id":"quigley-hambrick-ceo-effect-comment","title":"Reaffirming the CEO Effect Is Significant and Much Larger than Chance: A Comment on Fitza (2014)","authors":["Timothy J. Quigley","Scott D. Graffin"],"venue":"Strategic Management Journal","tier":"S","year":"2016","doi":"10.1002/smj.2503","critiqueType":"comment","target":{"title":"The use of variance decomposition in the investigation of CEO effects: How large must the CEO effect be to rule out chance?","authors":["Markus A. Fitza"],"doi":"10.1002/smj.2192","year":""},"whatItChallenges":"Challenges Fitza's (2014) claim that the estimated 'CEO effect' on firm performance is almost entirely an artifact of random chance, arguing his simulation/variance-decomposition approach mis-specifies the chance baseline. Using corrected methods, they conclude the CEO effect is statistically significant and substantively much larger than chance.","dimensions":["methods","identification","statistics","reproducibility"],"aiRelated":false,"field":"Strategic Management / Organization","verifyNote":"DOI resolves in Crossref to this exact title in Strategic Management Journal (2016), indexed as journal-article.","doi_url":"https://doi.org/10.1002/smj.2503","target_doi_url":"https://doi.org/10.1002/smj.2192"},{"id":"fitza-ceo-effect-rejoinder","title":"How Much Do CEOs Really Matter? Reaffirming That the CEO Effect Is Mostly Due to Chance","authors":["Markus A. Fitza"],"venue":"Strategic Management Journal","tier":"S","year":"2016","doi":"10.1002/smj.2597","critiqueType":"rejoinder","target":{"title":"Reaffirming the CEO Effect Is Significant and Much Larger than Chance: A Comment on Fitza (2014) (Quigley & Graffin 2017)","authors":[],"doi":"10.1002/smj.2503","year":""},"whatItChallenges":"Rejoinder defending the original conclusion against Quigley and Graffin's comment, arguing that once more realistic assumptions about how chance affects firm performance are imposed, the apparent CEO effect is statistically indistinguishable from chance regardless of the estimation methodology used.","dimensions":["methods","identification","statistics","claims"],"aiRelated":false,"field":"Strategic Management / Organization","verifyNote":"DOI resolves in Crossref to this exact title in Strategic Management Journal (2016), indexed as journal-article.","doi_url":"https://doi.org/10.1002/smj.2597","target_doi_url":"https://doi.org/10.1002/smj.2503"},{"id":"quigley-ceo-in-context-replication","title":"The \"CEO in Context\" Technique Revisited: A Replication and Extension of Hambrick and Quigley (2014)","authors":["Tobias Keller","Martin Glaum","Andreas Bausch","Thorsten Bunz"],"venue":"Strategic Management Journal","tier":"S","year":"2022","doi":"10.1002/smj.3453","critiqueType":"replication","target":{"title":"Toward More Accurate Contextualization of the CEO Effect on Firm Performance (Hambrick & Quigley 2014)","authors":[],"doi":"10.1002/smj.2108","year":""},"whatItChallenges":"Replicates and extends the 'CEO in Context' technique on a far larger sample (33,996 firm-years vs 4,866) and broadly CONFIRMS the original's high CEO effect — attributing about a third of the variance in firm performance (ROA) to the CEO — while showing the estimate shrinks under an adjusted-R² specification, a within-paper robustness nuance rather than an overturning of the headline finding.","dimensions":["methods","statistics","reproducibility","generalisation","identification"],"aiRelated":false,"field":"Strategic Management / Organization","verifyNote":"DOI resolves in Crossref to this exact title in Strategic Management Journal (2022), indexed as journal-article.","doi_url":"https://doi.org/10.1002/smj.3453","target_doi_url":"https://doi.org/10.1002/smj.2108"},{"id":"garip-fragile-families-prediction","title":"What failure to predict life outcomes can teach us","authors":["Filiz Garip"],"venue":"Proceedings of the National Academy of Sciences","tier":"S","year":"2020","doi":"10.1073/pnas.2003390117","critiqueType":"critical_commentary","target":{"title":"Measuring the predictability of life outcomes with a scientific mass collaboration","authors":[],"doi":"10.1073/pnas.1915006117","year":""},"whatItChallenges":"An invited PNAS commentary on Salganik et al.'s Fragile Families Challenge, arguing that the mass-collaboration finding that machine-learning models barely beat a simple benchmark exposes real limits of predictive ML in social science, and that the value lies in the common-task framework and out-of-sample testing rather than in any individual model's accuracy. It reframes the celebrated ML exercise as evidence of how little predictive purchase rich data plus ML actually buys for individual life outcomes.","dimensions":["methods","claims","overclaiming","generalisation","reproducibility"],"aiRelated":true,"field":"Sociology / computational social science","verifyNote":"DOI resolves in Crossref to this exact title in Proceedings of the National Academy of Sciences (2020), indexed as journal-article.","doi_url":"https://doi.org/10.1073/pnas.2003390117","target_doi_url":"https://doi.org/10.1073/pnas.1915006117"},{"id":"dressel-farid-compas","title":"The accuracy, fairness, and limits of predicting recidivism","authors":["Julia Dressel","Hany Farid"],"venue":"Science Advances","tier":"A","year":"2018","doi":"10.1126/sciadv.aao5580","critiqueType":"reanalysis","target":{"title":"Evaluating the predictive validity of the COMPAS Risk and Needs Assessment System (Northpointe/COMPAS recidivism risk tool)","authors":[],"doi":"10.1177/0093854808326545","year":""},"whatItChallenges":"A widely cited reanalysis showing that the commercial COMPAS recidivism risk algorithm (137 features) is no more accurate or fair than predictions from untrained humans on Mechanical Turk (62% vs 65%), and that a simple two-feature linear classifier matches COMPAS's accuracy. It directly challenges claims that proprietary ML risk-assessment tools provide superior, sophisticated predictive power over simple baselines.","dimensions":["statistics","methods","claims","overclaiming","novelty"],"aiRelated":true,"field":"Criminology / algorithmic risk assessment","verifyNote":"DOI resolves in Crossref to this exact title in Science Advances (2018), indexed as journal-article.","doi_url":"https://doi.org/10.1126/sciadv.aao5580","target_doi_url":"https://doi.org/10.1177/0093854808326545"},{"id":"kapoor-narayanan-ml-leakage","title":"Leakage and the reproducibility crisis in machine-learning-based science","authors":["Sayash Kapoor","Arvind Narayanan"],"venue":"Patterns","tier":"A","year":"2023","doi":"10.1016/j.patter.2023.100804","critiqueType":"reanalysis","target":{"title":"ML-based civil-war / armed-conflict prediction studies claiming complex ML outperforms logistic regression (e.g., Colaresi & Mahmood and the systematic review of conflict-forecasting papers)","authors":[],"doi":null,"year":""},"whatItChallenges":"A reproducibility audit identifying data leakage as a pervasive failure mode across 294 ML-based-science papers in 17 fields; its central social-science case study reproduces civil-war prediction papers and shows that, after correcting for leakage, complex ML models do not outperform decades-old logistic regression, overturning published claims of ML superiority. It challenges overclaimed ML performance and proposes model info sheets as a remedy.","dimensions":["reproducibility","data_code","methods","statistics","overclaiming","novelty"],"aiRelated":true,"field":"Political science / computational social science (ML methodology)","verifyNote":"DOI resolves in Crossref to this exact title in Patterns (2023), indexed as journal-article.","doi_url":"https://doi.org/10.1016/j.patter.2023.100804","target_doi_url":null},{"id":"green-palmquist-schickler-macropartisanship","title":"Macropartisanship: A Replication and Critique","authors":["Donald Green","Bradley Palmquist","Eric Schickler"],"venue":"American Political Science Review","tier":"S","year":"1998","doi":"10.2307/2586310","critiqueType":"replication","target":{"title":"Macropartisanship","authors":["Michael B. MacKuen","Robert S. Erikson","James A. Stimson"],"doi":"10.2307/1961661","year":""},"whatItChallenges":"Replicates MacKuen, Erikson, and Stimson's claim that aggregate party identification swings substantially in response to short-term shocks like consumer sentiment and presidential approval. Using more extensive survey data and correcting for measurement error, finds the short-term partisan movement is two to three times smaller than originally reported, supporting a stable, slow-adjusting view of partisanship.","dimensions":["statistics","methods","reproducibility","claims","overclaiming"],"aiRelated":false,"field":"Political Science (public opinion / political behavior)","verifyNote":"DOI resolves in Crossref to this exact title in American Political Science Review (1998), indexed as journal-article.","doi_url":"https://doi.org/10.2307/2586310","target_doi_url":"https://doi.org/10.2307/1961661"},{"id":"imai-gotv-turnout","title":"Do Get-Out-the-Vote Calls Reduce Turnout? The Importance of Statistical Methods for Field Experiments","authors":["Kosuke Imai"],"venue":"American Political Science Review","tier":"S","year":"2005","doi":"10.1017/s0003055405051658","critiqueType":"reanalysis","target":{"title":"The Effects of Canvassing, Telephone Calls, and Direct Mail on Voter Turnout: A Field Experiment (Gerber & Green, 2000)","authors":[],"doi":"10.2307/2585837","year":""},"whatItChallenges":"Reanalyzes Gerber and Green's influential New Haven GOTV field experiment and argues the implemented treatment and control groups were not balanced as a randomized design requires; applying matching and corrected statistical methods, claims that phone calls in fact produced large positive turnout effects, contradicting the original null result and highlighting the consequences of statistical/computational choices in experiments.","dimensions":["identification","statistics","methods","data_code","reproducibility","claims"],"aiRelated":false,"field":"Political Science (experimental methods / voter mobilization)","verifyNote":"DOI resolves in Crossref to this exact title in American Political Science Review (2005), indexed as journal-article.","doi_url":"https://doi.org/10.1017/s0003055405051658","target_doi_url":"https://doi.org/10.2307/2585837"},{"id":"gerber-green-gotv-rejoinder","title":"Correction to Gerber and Green (2000), Replication of Disputed Findings, and Reply to Imai (2005)","authors":["Alan S. Gerber","Donald P. Green"],"venue":"American Political Science Review","tier":"S","year":"2005","doi":"10.1017/s000305540505166x","critiqueType":"rejoinder","target":{"title":"Do Get-Out-the-Vote Calls Reduce Turnout? The Importance of Statistical Methods for Field Experiments (Imai, 2005)","authors":[],"doi":"10.1017/s0003055405051658","year":""},"whatItChallenges":"Responds to Imai's (2005) reanalysis: acknowledges and repairs data-processing errors in the original 2000 article, then argues Imai's correction itself contains statistical, computational, and reporting errors that invalidate its conclusions. After fixes, the original substantive finding stands that brief phone calls do not meaningfully increase voter turnout.","dimensions":["statistics","methods","data_code","reproducibility","claims"],"aiRelated":false,"field":"Political Science (experimental methods / voter mobilization)","verifyNote":"DOI resolves in Crossref to this exact title in American Political Science Review (2005), indexed as journal-article.","doi_url":"https://doi.org/10.1017/s000305540505166x","target_doi_url":"https://doi.org/10.1017/s0003055405051658"},{"id":"petersen-koput-density-dependence","title":"Density Dependence in Organizational Mortality: Legitimacy or Unobserved Heterogeneity?","authors":["Trond Petersen","Kenneth W. Koput"],"venue":"American Sociological Review","tier":"S","year":"1991","doi":"10.2307/2096112","critiqueType":"critical_commentary","target":{"title":"Density Dependence in the Evolution of Populations of Newspaper Organizations","authors":["Glenn R. Carroll","Michael T. Hannan"],"doi":"10.2307/2095875","year":""},"whatItChallenges":"Challenges the standard interpretation of density-dependence tests in organizational ecology, arguing the observed negative first-order effect of organizational density on mortality rates is equally consistent with unobserved heterogeneity (selection) rather than the theorized legitimation process, undermining the causal-theoretical reading of Hannan and Carroll's models.","dimensions":["statistics","identification","methods","theory","claims"],"aiRelated":false,"field":"Sociology (organizational ecology)","verifyNote":"DOI resolves in Crossref to this exact title in American Sociological Review (1991), indexed as journal-article.","doi_url":"https://doi.org/10.2307/2096112","target_doi_url":"https://doi.org/10.2307/2095875"},{"id":"cheng-measurement-methods-divergent","title":"Measurement, methods, and divergent patterns: Reassessing the effects of same-sex parents","authors":["Simon Cheng","Brian Powell"],"venue":"Social Science Research","tier":"A","year":"2015","doi":"10.1016/j.ssresearch.2015.04.005","critiqueType":"reanalysis","target":{"title":"How different are the adult children of parents who have same-sex relationships? Findings from the New Family Structures Study","authors":[],"doi":"10.1016/j.ssresearch.2012.03.009","year":""},"whatItChallenges":"Reanalyzes Regnerus's (2012) New Family Structures Study and shows his negative findings for the adult children of parents who had a same-sex relationship are fragile. At least a third to two-fifths of the 236 same-sex-parent cases are misclassified, and the results further hinge on contested measurement and coding choices (outcome recoding, the comparison category, sociodemographic controls, multiple imputation). Correcting the misclassification and these choices renders most of the associations statistically insignificant.","dimensions":["data_code","methods","statistics","reproducibility","claims","overclaiming"],"aiRelated":false,"field":"Sociology (family/demography)","verifyNote":"DOI resolves in Crossref to this exact title in Social Science Research (2015), indexed as journal-article.","doi_url":"https://doi.org/10.1016/j.ssresearch.2015.04.005","target_doi_url":"https://doi.org/10.1016/j.ssresearch.2012.03.009"},{"id":"alba-commentary-kids-mostly","title":"Commentary: The Kids Are (Mostly) Alright: Second-Generation Assimilation: Comments on Haller, Portes and Lynch","authors":["Richard Alba","Philip Kasinitz","Mary C. Waters"],"venue":"Social Forces","tier":"A","year":"2011","doi":"10.1093/sf/89.3.763","critiqueType":"comment","target":{"title":"Dreams Fulfilled, Dreams Shattered: Determinants of Segmented Assimilation in the Second Generation","authors":[],"doi":"10.1353/sof.2011.0003","year":""},"whatItChallenges":"Challenges Haller, Portes and Lynch's pessimistic 'segmented assimilation / downward assimilation' thesis about the immigrant second generation, arguing that their data, model specification and interpretation overstate the prevalence and inevitability of downward mobility, and that the bulk of second-generation outcomes are in fact reasonably positive.","dimensions":["claims","theory","methods","generalisation","overclaiming"],"aiRelated":false,"field":"Sociology (immigration/assimilation)","verifyNote":"DOI resolves in Crossref to this exact title in Social Forces (2011), indexed as journal-article.","doi_url":"https://doi.org/10.1093/sf/89.3.763","target_doi_url":"https://doi.org/10.1353/sof.2011.0003"},{"id":"clampet-lundquist-neighborhood-effects-economic","title":"Neighborhood Effects on Economic Self-Sufficiency: A Reconsideration of the Moving to Opportunity Experiment","authors":["Susan Clampet-Lundquist","Douglas S. Massey"],"venue":"American Journal of Sociology","tier":"S","year":"2008","doi":"10.1086/588740","critiqueType":"reanalysis","target":{"title":"Moving to Opportunity / experimental analyses concluding null neighborhood effects on economic self-sufficiency (Kling, Liebman, and Katz; MTO interim impacts evaluation)","authors":[],"doi":null,"year":""},"whatItChallenges":"Reconsiders the influential Moving to Opportunity housing-voucher experiment's conclusion of null neighborhood effects on adult economic self-sufficiency, arguing that the intention-to-treat design and treatment definition mask real effects; using duration and quality of neighborhood exposure, the authors find evidence that sustained exposure to lower-poverty neighborhoods does improve economic outcomes.","dimensions":["identification","methods","claims","statistics","reproducibility"],"aiRelated":false,"field":"Sociology (neighborhood effects / urban poverty)","verifyNote":"DOI resolves in Crossref to this exact title in American Journal of Sociology (2008), indexed as journal-article.","doi_url":"https://doi.org/10.1086/588740","target_doi_url":null},{"id":"breznau-missing-main-effect","title":"The Missing Main Effect of Welfare State Regimes: A Replication of 'Social Policy Responsiveness in Developed Democracies' by Brooks and Manza","authors":["Nate Breznau"],"venue":"Sociological Science","tier":"A","year":"2015","doi":"10.15195/v2.a20","critiqueType":"replication","target":{"title":"Social Policy Responsiveness in Developed Democracies","authors":[],"doi":"10.1177/000312240607100306","year":""},"whatItChallenges":"Replicates Brooks and Manza's (2006, ASR) claim that public opinion drives welfare-state spending and finds it rests on a model specification error: they included an opinion-by-welfare-regime interaction while omitting the main effect of welfare regime; restoring the missing main effect across more than 800 model configurations eliminates the original finding in roughly 99.5% of cases.","dimensions":["statistics","methods","reproducibility","data_code","claims"],"aiRelated":false,"field":"Sociology (political sociology / welfare state)","verifyNote":"DOI resolves in Crossref to this exact title in Sociological Science (2015), indexed as journal-article.","doi_url":"https://doi.org/10.15195/v2.a20","target_doi_url":"https://doi.org/10.1177/000312240607100306"},{"id":"caughey-elections-regression-discontinuity","title":"Elections and the Regression Discontinuity Design: Lessons from Close U.S. House Races, 1942-2008","authors":["Devin Caughey","Jasjeet S. Sekhon"],"venue":"Political Analysis","tier":"A","year":"2011","doi":"10.1093/pan/mpr032","critiqueType":"reanalysis","target":{"title":"Randomized Experiments from Non-random Selection in U.S. House Elections","authors":[],"doi":"10.1016/j.jeconom.2007.05.004","year":""},"whatItChallenges":"Replicating close U.S. House races, it shows bare winners and bare losers differ markedly on pretreatment covariates (financial, experience, and incumbency advantages), undermining the as-if-random assumption underpinning Lee-style regression discontinuity designs for elections. It attributes the imbalance to sorting via activities on or before Election Day rather than post-election manipulation.","dimensions":["identification","methods","statistics","reproducibility","generalisation"],"aiRelated":false,"field":"Political science (political methodology / causal inference)","verifyNote":"DOI resolves in Crossref to this exact title in Political Analysis (2011), indexed as journal-article.","doi_url":"https://doi.org/10.1093/pan/mpr032","target_doi_url":"https://doi.org/10.1016/j.jeconom.2007.05.004"},{"id":"eggers-validity-regression-discontinuity","title":"On the Validity of the Regression Discontinuity Design for Estimating Electoral Effects: New Evidence from Over 40,000 Close Races","authors":["Andrew C. Eggers","Anthony Fowler","Jens Hainmueller","Andrew B. Hall","James M. Snyder Jr."],"venue":"American Journal of Political Science","tier":"S","year":"2014","doi":"10.1111/ajps.12127","critiqueType":"replication","target":{"title":"Elections and the Regression Discontinuity Design: Lessons from Close U.S. House Races, 1942-2008","authors":[],"doi":"10.1093/pan/mpr032","year":""},"whatItChallenges":"Assembling over 40,000 close races across many electoral settings, it finds no systematic evidence of strategic sorting or covariate imbalance at the threshold, arguing the close-election RD design is generally valid and that the Caughey-Sekhon/Snyder imbalance is largely specific to postwar U.S. House races rather than a general flaw. It reframes the earlier critique as an unusual case rather than evidence against RD broadly.","dimensions":["identification","methods","statistics","reproducibility","generalisation","claims"],"aiRelated":false,"field":"Political science (political methodology / causal inference)","verifyNote":"DOI resolves in Crossref to this exact title in American Journal of Political Science (2014), indexed as journal-article.","doi_url":"https://doi.org/10.1111/ajps.12127","target_doi_url":"https://doi.org/10.1093/pan/mpr032"},{"id":"abramson-what-do-we","title":"What Do We Learn about Voter Preferences from Conjoint Experiments?","authors":["Scott F. Abramson","Korhan Kocak","Asya Magazinnik"],"venue":"American Journal of Political Science","tier":"S","year":"2022","doi":"10.1111/ajps.12714","critiqueType":"critical_commentary","target":{"title":"Causal Inference in Conjoint Analysis: Understanding Multidimensional Choices via Stated Preference Experiments","authors":[],"doi":"10.1093/pan/mpt024","year":""},"whatItChallenges":"It shows that the average marginal component effect (AMCE), the central estimand of conjoint experiments popularized by Hainmueller, Hopkins, and Yamamoto, is not well defined in terms of majority preferences: even with rational subjects a positive AMCE can point opposite to the true majority preference, so AMCEs do not license common claims about what voters prefer. It argues the estimand conflates direction and intensity of preferences across respondents.","dimensions":["theory","methods","claims","overclaiming","statistics"],"aiRelated":false,"field":"Political science (political methodology / survey experiments)","verifyNote":"DOI resolves in Crossref to this exact title in American Journal of Political Science (2022), indexed as journal-article.","doi_url":"https://doi.org/10.1111/ajps.12714","target_doi_url":"https://doi.org/10.1093/pan/mpt024"},{"id":"fowler-do-shark-attacks","title":"Do Shark Attacks Influence Presidential Elections? Reassessing a Prominent Finding on Voter Competence","authors":["Anthony Fowler","Andrew B. Hall"],"venue":"The Journal of Politics","tier":"S","year":"2018","doi":"10.1086/699244","critiqueType":"reanalysis","target":{"title":"Blind Retrospection: Electoral Responses to Drought, Flu, and Shark Attacks (in Democracy for Realists)","authors":[],"doi":null,"year":""},"whatItChallenges":"Reanalyzing Achen and Bartels's claim that 1916 New Jersey shark attacks cost Woodrow Wilson roughly ten points in beach communities, it finds the county-level effect shrinks and weakens under alternative specifications and the town-level Ocean County result largely vanishes once coding errors are corrected. It concludes there is little compelling evidence that shark attacks influenced the election, casting doubt on this prominent 'blind retrospection' demonstration of voter incompetence.","dimensions":["statistics","data_code","claims","overclaiming","reproducibility"],"aiRelated":false,"field":"Political science (voting behavior / retrospective voting)","verifyNote":"DOI resolves in Crossref to this exact title in The Journal of Politics (2018), indexed as journal-article.","doi_url":"https://doi.org/10.1086/699244","target_doi_url":null},{"id":"prior-challenge-measuring-media","title":"The Challenge of Measuring Media Exposure: Reply to Dilliplane, Goldman, and Mutz","authors":["Markus Prior"],"venue":"Political Communication","tier":"A","year":"2013","doi":"10.1080/10584609.2013.819539","critiqueType":"reply","target":{"title":"Televised Exposure to Politics: New Measures for a Fragmented Media Environment","authors":[],"doi":"10.1111/j.1540-5907.2012.00600.x","year":""},"whatItChallenges":"Prior critiques the program-list measure of televised political exposure that Dilliplane, Goldman, and Mutz proposed (and which the ANES adopted), arguing it has low construct validity because it never measures the amount of exposure and shows poor convergent validity by several criteria. He contends the measure conflates recall/recognition with exposure and overstates the predictive payoff of the new instrument.","dimensions":["methods","statistics","claims","generalisation"],"aiRelated":false,"field":"Political communication / media exposure measurement","verifyNote":"DOI resolves in Crossref to this exact title in Political Communication (2013), indexed as journal-article.","doi_url":"https://doi.org/10.1080/10584609.2013.819539","target_doi_url":"https://doi.org/10.1111/j.1540-5907.2012.00600.x"},{"id":"burton-reconsidering-evidence-moral","title":"Reconsidering evidence of moral contagion in online social networks","authors":["Jason W. Burton","Nicole Cruz","Ulrike Hahn"],"venue":"Nature Human Behaviour","tier":"S","year":"2021","doi":"10.1038/s41562-021-01133-5","critiqueType":"replication","target":{"title":"Emotion shapes the diffusion of moralized content in social networks","authors":[],"doi":"10.1073/pnas.1618923114","year":""},"whatItChallenges":"Re-tests Brady et al.'s (2017) 'moral contagion' method on six new Twitter corpora rather than reanalysing their data, and finds via out-of-sample prediction, model comparison and specification-curve analysis that the moral-contagion model performs no better than an implausibly-named 'XYZ contagion' placebo — challenging the strength of the original correlational claim while conceding moral contagion may still exist.","dimensions":["methods","statistics","identification","reproducibility","overclaiming","claims"],"aiRelated":true,"field":"Computational social science / social-media text analysis","verifyNote":"DOI resolves in Crossref to this exact title in Nature Human Behaviour (2021), indexed as journal-article.","doi_url":"https://doi.org/10.1038/s41562-021-01133-5","target_doi_url":"https://doi.org/10.1073/pnas.1618923114"},{"id":"pena-game-perspective-taking","title":"Game perspective-taking effects on willingness to help immigrants: A replication study with a Spanish sample","authors":["Jorge Peña","Juan Francisco Hernández Pérez"],"venue":"New Media & Society","tier":"A","year":"2019","doi":"10.1177/1461444819874472","critiqueType":"replication","target":{"title":"Game Perspective-Taking Effects on Players' Behavioral Intention, Attitudes, Subjective Norms, and Self-Efficacy to Help Immigrants: The Case of 'Papers, Please'","authors":[],"doi":"10.1089/cyber.2018.0030","year":""},"whatItChallenges":"A replication of a perspective-taking game study on willingness to help immigrants. The original reported reductions in behavioural intention, subjective norms and self-efficacy (attitudes were unaffected); the Spanish-sample replication reproduced the intention effect but not the subjective-norms or self-efficacy effects, while finding an attitude effect the original did not — partly corroborating and partly diverging from the original.","dimensions":["reproducibility","generalisation","claims","methods"],"aiRelated":false,"field":"New media / games studies / media effects","verifyNote":"DOI resolves in Crossref to this exact title in New Media & Society (2019), indexed as journal-article.","doi_url":"https://doi.org/10.1177/1461444819874472","target_doi_url":"https://doi.org/10.1089/cyber.2018.0030"},{"id":"crede-what-shall-we","title":"What Shall We Do About Grit? A Critical Review of What We Know and What We Don't Know","authors":["Marcus Credé"],"venue":"Educational Researcher","tier":"A","year":"2018","doi":"10.3102/0013189X18801322","critiqueType":"critical_commentary","target":{"title":"Grit: Perseverance and Passion for Long-Term Goals (Duckworth, Peterson, Matthews, & Kelly)","authors":[],"doi":"10.1037/0022-3514.92.6.1087","year":""},"whatItChallenges":"Critically reviews the grit literature popularized by Angela Duckworth, arguing the empirical evidence does not justify combining passion and perseverance into a single construct, that grit predicts academic performance only weakly (and no better than conscientiousness, a jangle-fallacy concern), and that there is no evidence grit interventions work.","dimensions":["statistics","claims","overclaiming","theory","novelty"],"aiRelated":false,"field":"Education / Educational Psychology","verifyNote":"DOI resolves in Crossref to this exact title in Educational Researcher (2018), indexed as journal-article.","doi_url":"https://doi.org/10.3102/0013189X18801322","target_doi_url":"https://doi.org/10.1037/0022-3514.92.6.1087"},{"id":"skiba-risks-consequences-oversimplifying","title":"Risks and Consequences of Oversimplifying Educational Inequities: A Response to Morgan et al. (2015)","authors":["Russell J. Skiba","Alfredo J. Artiles","Elizabeth B. Kozleski","Daniel J. Losen","Elizabeth G. Harry"],"venue":"Educational Researcher","tier":"A","year":"2016","doi":"10.3102/0013189X16644606","critiqueType":"comment","target":{"title":"Minorities Are Disproportionately Underrepresented in Special Education: Longitudinal Evidence Across Five Disability Conditions","authors":[],"doi":"10.3102/0013189X15591157","year":""},"whatItChallenges":"Directly challenges Morgan et al.'s widely-cited claim that racial/ethnic minorities are underrepresented (not overrepresented) in special education, arguing the conclusion is in error due to sampling and model-specification choices, the heavy covariate adjustment that conditions away the inequities of interest, and failure to engage the broader complexity of disproportionality.","dimensions":["methods","identification","statistics","claims","overclaiming","generalisation"],"aiRelated":false,"field":"Education / Special Education Policy","verifyNote":"DOI resolves in Crossref to this exact title in Educational Researcher (2016), indexed as journal-article.","doi_url":"https://doi.org/10.3102/0013189X16644606","target_doi_url":"https://doi.org/10.3102/0013189X15591157"},{"id":"benbow-rejoinder-critiques-national","title":"Rejoinder to the Critiques of the National Mathematics Advisory Panel Final Report","authors":["Camilla Persson Benbow","Larry R. Faulkner"],"venue":"Educational Researcher","tier":"A","year":"2008","doi":"10.3102/0013189X08329195","critiqueType":"rejoinder","target":{"title":"Critiques of the National Mathematics Advisory Panel Final Report (incl. Boaler, 'When Politics Took the Place of Inquiry,' and Kelly, 'Reflections on the National Mathematics Advisory Panel Final Report')","authors":[],"doi":null,"year":""},"whatItChallenges":"The Panel chair and co-chair rebut a cluster of published critiques (notably Boaler and Kelly) that attacked the Panel's restriction to randomized/quasi-experimental evidence on math curricula, instruction, and learning, defending the evidentiary standards and contesting claims that the report's methodology was inappropriate for educational field research.","dimensions":["methods","identification","claims","generalisation","theory"],"aiRelated":false,"field":"Education / Mathematics Education Policy","verifyNote":"DOI resolves in Crossref to this exact title in Educational Researcher (2008), indexed as journal-article.","doi_url":"https://doi.org/10.3102/0013189X08329195","target_doi_url":null},{"id":"rothstein-measuring-impacts-teachers","title":"Measuring the Impacts of Teachers: Comment","authors":["Jesse Rothstein"],"venue":"American Economic Review","tier":"S","year":"2017","doi":"10.1257/aer.20141440","critiqueType":"comment","target":{"title":"Measuring the Impacts of Teachers I: Evaluating Bias in Teacher Value-Added Estimates","authors":["Raj Chetty","John N. Friedman","Jonah E. Rockoff"],"doi":"10.1257/aer.104.9.2593","year":""},"whatItChallenges":"Challenges Chetty, Friedman, and Rockoff's (2014) claim that the teacher-switching quasi-experiment shows student sorting creates negligible bias in teacher value-added (VA) scores. Rothstein shows teacher switching is correlated with changes in prior student preparedness, so the design is invalid; correcting for this reveals moderate VA bias (10-35% of teachers' causal effect variance) and shows long-run results are fragile to control choices.","dimensions":["identification","methods","statistics","reproducibility","claims"],"aiRelated":false,"field":"Economics (education / labor economics)","verifyNote":"DOI resolves in Crossref to this exact title in American Economic Review (2017), indexed as journal-article.","doi_url":"https://doi.org/10.1257/aer.20141440","target_doi_url":"https://doi.org/10.1257/aer.104.9.2593"},{"id":"neumark-minimum-wages-employment","title":"Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania: Comment","authors":["David Neumark","William Wascher"],"venue":"American Economic Review","tier":"S","year":"2000","doi":"10.1257/aer.90.5.1362","critiqueType":"comment","target":{"title":"Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania","authors":["David Card","Alan B. Krueger"],"doi":null,"year":""},"whatItChallenges":"Re-examines Card and Krueger's (1994) finding that a New Jersey minimum-wage increase did not reduce (and may have raised) fast-food employment. Using administrative payroll records rather than the original telephone-survey data, Neumark and Wascher find that employment fell after the minimum-wage rise, contradicting the original positive/zero estimate and attributing the discrepancy to measurement error in the survey data.","dimensions":["data_code","identification","methods","claims","reproducibility"],"aiRelated":false,"field":"Economics (labor economics)","verifyNote":"DOI resolves in Crossref to this exact title in American Economic Review (2000), indexed as journal-article.","doi_url":"https://doi.org/10.1257/aer.90.5.1362","target_doi_url":null},{"id":"andrikogiannopoulou-reassessing-false-discoveries","title":"Reassessing False Discoveries in Mutual Fund Performance: Skill, Luck, or Lack of Power?","authors":["Angie Andrikogiannopoulou","Filippos Papakonstantinou"],"venue":"The Journal of Finance","tier":"S","year":"2019","doi":"10.1111/jofi.12784","critiqueType":"reanalysis","target":{"title":"False Discoveries in Mutual Fund Performance: Measuring Luck in Estimated Alphas","authors":["Laurent Barras","Olivier Scaillet","Russ Wermers"],"doi":"10.1111/j.1540-6261.2009.01527.x","year":""},"whatItChallenges":"Reanalyzes the false-discovery-rate (FDR) method that Barras, Scaillet, and Wermers (2010) use to separate skilled, zero-alpha, and unskilled mutual funds. Andrikogiannopoulou and Papakonstantinou show via simulation that the FDR estimator is severely biased and underpowered at empirically relevant sample sizes, drastically overstating the fraction of zero-alpha funds and understating the proportion of skilled and unskilled funds.","dimensions":["statistics","methods","reproducibility","claims"],"aiRelated":false,"field":"Finance (asset pricing / mutual fund performance)","verifyNote":"DOI resolves in Crossref to this exact title in The Journal of Finance (2019), indexed as journal-article.","doi_url":"https://doi.org/10.1111/jofi.12784","target_doi_url":"https://doi.org/10.1111/j.1540-6261.2009.01527.x"},{"id":"messner-poverty-infant-mortality","title":"Poverty, Infant Mortality, and Homicide Rates in Cross-National Perspective: Assessments of Criterion and Construct Validity","authors":["Steven F. Messner","Lawrence E. Raffalovich","Gretchen M. Sutton"],"venue":"Criminology","tier":"S","year":"2010","doi":"10.1111/j.1745-9125.2010.00194.x","critiqueType":"critical_commentary","target":{"title":"A Methodological Addition to the Cross-National Empirical Literature on Social Structure and Homicide: A First Test of the Poverty-Homicide Thesis","authors":["William Alex Pridemore"],"doi":"10.1111/j.1745-9125.2008.00106.x","year":""},"whatItChallenges":"Responds to Pridemore's critique of using infant mortality as a proxy for poverty in cross-national homicide research. Rather than re-running his analysis, the authors assemble a new 16-nation panel (1993-2000) with direct income-based poverty measures and find infant mortality correlates more strongly with relative than absolute poverty, arguing disadvantage is best treated as a multidimensional construct — a qualified, collaborative response rather than a wholesale rejection of the proxy.","dimensions":["statistics","methods","claims"],"aiRelated":false,"field":"Criminology","verifyNote":"DOI resolves in Crossref to this exact title in Criminology (2010), indexed as journal-article.","doi_url":"https://doi.org/10.1111/j.1745-9125.2010.00194.x","target_doi_url":"https://doi.org/10.1111/j.1745-9125.2008.00106.x"},{"id":"sievert-replication-representative-bureaucracy","title":"A replication of \"Representative bureaucracy and the willingness to coproduce\"","authors":["Martin Sievert"],"venue":"Public Administration","tier":"A","year":"2021","doi":"10.1111/padm.12743","critiqueType":"replication","target":{"title":"Representative Bureaucracy and the Willingness to Coproduce: An Experimental Study","authors":["Norma M. Riccucci","Gregg G. Van Ryzin","Huafang Li"],"doi":"10.1111/puar.12401","year":""},"whatItChallenges":"A wide replication, on new data, of Riccucci, Van Ryzin and Li's (2016) survey experiment on representative bureaucracy and citizens' willingness to coproduce, testing whether the original's representation effects hold in a different national context rather than re-running the original data.","dimensions":["reproducibility","statistics","methods","generalisation","claims"],"aiRelated":false,"field":"Public administration","verifyNote":"DOI resolves in Crossref to this exact title in Public Administration (2021), indexed as journal-article.","doi_url":"https://doi.org/10.1111/padm.12743","target_doi_url":"https://doi.org/10.1111/puar.12401"},{"id":"kleck-impossible-policy-evaluations","title":"Impossible Policy Evaluations and Impossible Conclusions: A Comment on Koper and Roth","authors":["Gary Kleck"],"venue":"Journal of Quantitative Criminology","tier":"A","year":"2001","doi":"10.1023/a:1007574415289","critiqueType":"comment","target":{"title":"The Impact of the 1994 Federal Assault Weapon Ban on Gun Violence Outcomes: An Assessment of Multiple Outcome Measures and Some Lessons for Policy Evaluation","authors":["Christopher S. Koper","Jeffrey A. Roth"],"doi":"10.1023/a:1007522431219","year":""},"whatItChallenges":"Argues that Koper and Roth's evaluation of the 1994 federal assault weapons ban could not, by design, detect any plausible effect because assault weapons figure in so few homicides, so the data lack statistical power to support any conclusion. Contends their tentative inference that the ban may have reduced gun homicides is essentially impossible to sustain from the evidence presented.","dimensions":["statistics","identification","methods","claims","overclaiming"],"aiRelated":false,"field":"Quantitative criminology / public policy","verifyNote":"DOI resolves in Crossref to this exact title in Journal of Quantitative Criminology (2001), indexed as journal-article.","doi_url":"https://doi.org/10.1023/a:1007574415289","target_doi_url":"https://doi.org/10.1023/a:1007522431219"},{"id":"greenberg-long-term-trends","title":"Long-Term Trends in Crimes of Violence (Comment on Cooney, 2003)","authors":["David F. Greenberg"],"venue":"Criminology","tier":"S","year":"2003","doi":"10.1111/j.1745-9125.2003.tb01024.x","critiqueType":"comment","target":{"title":"The Privatization of Violence","authors":["Mark Cooney"],"doi":"10.1111/j.1745-9125.2003.tb01023.x","year":""},"whatItChallenges":"Challenges Cooney's (2003) historical thesis that violence has been 'privatized' (shifting from elite/collective to marginal/individual actors), arguing the long-term empirical evidence on violent-crime trends and the social characteristics of offenders does not support the claimed qualitative transformation. Questions the selectivity and interpretation of the historical and anthropological evidence Cooney marshals.","dimensions":["theory","claims","methods","generalisation","data_code"],"aiRelated":false,"field":"Criminology","verifyNote":"DOI resolves in Crossref to this exact title in Criminology (2003), indexed as journal-article.","doi_url":"https://doi.org/10.1111/j.1745-9125.2003.tb01024.x","target_doi_url":"https://doi.org/10.1111/j.1745-9125.2003.tb01023.x"}]}