{"$schema":"https://policywindow.org/critique/api/schema","name":"Critical AI — state of AI social science integrity","description":"A living, versioned institutional report. The journal as a standing research program: every figure re-derives from the live benchmark modules (integrity ledger, coverage atlas, correctness, head-to-head + audit, critique-vs-fulltext coverage, structure/literature, the self-improvement engine), and the engine ranks its OWN next highest-value targets — boosting candidates that fill a coverage white-space — rather than working a hand-picked list. Records and synthesises; mutates nothing.","docs":"https://policywindow.org/critique/state","version":"1.0","as_of":"2026-06-25","headline":{"critiques":32,"pieces":32,"fabricated_citations":0,"domains_covered":3,"total_domains":8,"white_space_domains":5},"findings":[{"key":"zero_fabrication","claim":"Zero fabricated citations across the journal — the load-bearing integrity invariant.","metric":"0 fabricated of 32 pieces","source":"journalIntegrity()"},{"key":"abstract_access_ceiling","claim":"Abstract-based critiques cover a minority of what the full papers warrant; the blind spots are the full-text-only dimensions (disclosed limitations, sample/data, reproducibility, statistical inference).","metric":"39% overall coverage (lower bound); 5/8 dimensions below 50%","source":"critiqueCoverageSummary() + G78 audit"},{"key":"correctness","claim":"Reading only abstracts and blind to the expert verdict, the engine surfaces a majority of the abstract-detectable flaws authoritative human Comments later established — audit-confirmed.","metric":"63% confirmed recall (17/27)","source":"correctnessSummary()"},{"key":"head_to_head","claim":"Competitive with human and machine critics on abstract-detectable substance, and the best-calibrated arm; an audit showed the comparison understates the journal (a generic baseline's specificity edge was largely leakage it avoids).","metric":"vs human 5-2-1; calibration 2.96 vs 2.29; baseline leakage 26/102","source":"headToHeadSummary() + HEAD_TO_HEAD_AUDIT"},{"key":"self_improvement","claim":"A compounding self-improvement engine produced a validated, fabrication-clean, baseline-competitive methodology gain — gated on a held-out quality check AND a source-grounded fabrication check.","metric":"improvementActive() = true; specificity +0.39, leakage 4%","source":"improvementSummary()"},{"key":"structure_parsimony","claim":"The critique structure is already near-parsimonious (an adversarial re-check refuted bulk-consolidation), but the literature grounding is thin and only partially closed.","metric":"0/108 sections verified-redundant; 14/22 un-cited critiques newly grounded (incl. 3 grey)","source":"critiqueStructureSummary() + literatureGroundingSummary()"},{"key":"coverage","claim":"Cross-domain coverage with named white-space the agenda targets next.","metric":"3/8 domains critiqued; 5 white-space domains","source":"domainCoverage()"}],"program_status":"The selection queue is near exhaustion (3 ranked targets remain), and 5 of 8 domains are white-space with no critique yet. The program's self-directed next move is therefore EXPANSION — sourcing new high-impact candidates in the white-space domains — not just clearing the residual queue. This is the journal honestly reporting its own coverage frontier.","research_agenda":[{"id":"leveraging-generative-artificial-intelligence-to-c","title":"Leveraging Generative Artificial Intelligence to Create Visual Content in Digital Advertising","venue":"Marketing Science","field":"Marketing","domain":"management","priority":65.63,"impact":0.76,"fillsWhiteSpace":false,"rationale":"High-impact target in Management, IS & marketing (impact 0.76)."},{"id":"the-pragmatic-frames-of-spurious-correlations-in-m","title":"The pragmatic frames of spurious correlations in machine learning: Interpreting how and why they matter","venue":"Big Data & Society","field":"Science & technology studies","domain":"other","priority":65.39,"impact":0.64,"fillsWhiteSpace":true,"rationale":"High-impact target that fills a coverage gap: the journal has no critique in other yet."},{"id":"markovian-search-with-ex-ante-constraints-theory-a","title":"Markovian Search with Ex Ante Constraints: Theory and Applications to Socially Aware Algorithmic Hiring","venue":"Management Science","field":"Management & organisation","domain":"management","priority":58,"impact":0.6,"fillsWhiteSpace":false,"rationale":"High-impact target in Management, IS & marketing (impact 0.6)."}],"white_space":[{"domain":"political_science","label":"Political science"},{"domain":"psychology","label":"Psychology"},{"domain":"sociology","label":"Sociology"},{"domain":"public_policy","label":"Public policy & criminology"},{"domain":"education","label":"Education"}],"autonomous_loop":{"description":"The whole stack as one self-running loop (G87): agenda -> source -> critique -> gate -> publish -> self-audit -> self-improve -> report. The loop genuinely ran (cycle 1 = the live G85->G86->G87 sequence). Two safety switches keep a self-publishing engine honest: every critique passes the refute-by-default automated gate, and STANDING unattended scheduled operation is an operator-gated switch that is OFF by default — the orchestrator runs a cycle on demand, it does not run itself.","phases":[{"phase":"agenda","capability":"researchAgenda()","builtBy":"G85","autonomous":true,"gated":false,"description":"The program ranks its own next highest-value targets (selection score x white-space bonus)."},{"phase":"source","capability":"OpenAlex discovery + verification","builtBy":"G86 / G58","autonomous":true,"gated":false,"description":"Sources real candidates (verified DOI + abstract) in the agenda's domains."},{"phase":"critique","capability":"validated G84 engine (sharpen + grounding guard)","builtBy":"G84","autonomous":true,"gated":false,"description":"Generates span-grounded critiques with the held-out-validated, fabrication-clean directive."},{"phase":"gate","capability":"validateCritique + attestCritique + checkCritique","builtBy":"G44 / G50 / G80","autonomous":true,"gated":false,"description":"Refute-by-default automated integrity gate; refuses anything fabricated, over-severe, or un-grounded."},{"phase":"publish","capability":"registry + 4 integrity ledgers","builtBy":"G86","autonomous":false,"gated":true,"description":"Promotion to the live registry. Each critique passes the gate; STANDING unattended publication is operator-gated."},{"phase":"self_audit","capability":"adversarial re-verification (distinct lenses, refute-by-default)","builtBy":"G83 / G87","autonomous":true,"gated":false,"description":"Re-checks the just-published critiques for misreadings the in-loop panel missed (an all-pass panel under-fires)."},{"phase":"self_improve","capability":"compounding improvement engine (two-gate, held-out)","builtBy":"G82 / G84","autonomous":true,"gated":false,"description":"Benchmarks the engine, diagnoses a gap, validates a methodology change held-out before activating it."},{"phase":"report","capability":"living state-of-integrity report","builtBy":"G85","autonomous":true,"gated":false,"description":"Re-derives the institutional state so the loop's effect is visible and cannot go stale."}],"publish_gated":true,"schedule_enabled":false,"cycle_1":{"cycle":1,"agendaTopTarget":"Leveraging Generative Artificial Intelligence to Create Visual Content in Digital Advertising","sourced":3,"published":3,"selfAudit":{"runDate":"2026-06-25","critiquesAudited":3,"lensesPerCritique":3,"checks":9,"issuesFound":2,"corrected":2,"verdict":"The self-audit's adversarial core-claim lens caught what the in-loop faithfulness panel (which had returned all-faithful — the classic under-firing tell) MISSED: 2 of the 3 published critiques OVER-REACHED in their strongest critique against well-hedged abstracts (theory-of-mind faulted the abstract for lacking a mechanism it explicitly names; genai-creativity refuted an external-validity claim the abstract never made). The third held. The loop then SELF-CORRECTED: both were revised (v1.1) to their defensible calibration cores, re-gated, and re-published. The self-audit phase is load-bearing, and the institution corrected its own live output."},"selfImproveActive":true,"note":"Cycle 1 ran end-to-end: the agenda (G85) selected the psychology white-space, the engine sourced + critiqued + gated + published 3 targets (G86), the self-audit re-verified them (G87), and the self-improvement engine is validated-active (G84). Standing scheduled operation is operator-gated."},"headline":"An autonomous research institution, demonstrated end-to-end and self-correcting. Cycle 1: the agenda selected the psychology white-space, the engine sourced + critiqued + gated + published 3 targets, and the self-audit then caught what the in-loop faithfulness panel missed — 2 of 3 critiques over-reached their strongest critique — so the loop SELF-CORRECTED both (2 revised to v1.1) and re-gated. Self-improvement is validated-active. Standing scheduled auto-publication of critiques of named scholars stays operator-gated (schedule_enabled=false)."}}