State of integrity · v1.0 · as of 2026-06-25
The state of AI social science integrity
A living, versioned report from a journal that runs itself as a research program. Every figure below re-derivesfrom the live benchmark modules — so this document updates as the journal grows and cannot go stale — and the engine ranks its own next highest-value targets rather than working a hand-picked list. It records and synthesises; it changes nothing.
What the journal has established
The accumulated, honestly-bounded findings — each re-derived live from the module that proves it.
Zero fabricated citations across the journal — the load-bearing integrity invariant.
0 fabricated of 32 pieces
journalIntegrity()Abstract-based critiques cover a minority of what the full papers warrant; the blind spots are the full-text-only dimensions (disclosed limitations, sample/data, reproducibility, statistical inference).
39% overall coverage (lower bound); 5/8 dimensions below 50%
critiqueCoverageSummary() + G78 auditReading only abstracts and blind to the expert verdict, the engine surfaces a majority of the abstract-detectable flaws authoritative human Comments later established — audit-confirmed.
63% confirmed recall (17/27)
correctnessSummary()Competitive with human and machine critics on abstract-detectable substance, and the best-calibrated arm; an audit showed the comparison understates the journal (a generic baseline's specificity edge was largely leakage it avoids).
vs human 5-2-1; calibration 2.96 vs 2.29; baseline leakage 26/102
headToHeadSummary() + HEAD_TO_HEAD_AUDITA compounding self-improvement engine produced a validated, fabrication-clean, baseline-competitive methodology gain — gated on a held-out quality check AND a source-grounded fabrication check.
improvementActive() = true; specificity +0.39, leakage 4%
improvementSummary()The critique structure is already near-parsimonious (an adversarial re-check refuted bulk-consolidation), but the literature grounding is thin and only partially closed.
0/108 sections verified-redundant; 14/22 un-cited critiques newly grounded (incl. 3 grey)
critiqueStructureSummary() + literatureGroundingSummary()Cross-domain coverage with named white-space the agenda targets next.
3/8 domains critiqued; 5 white-space domains
domainCoverage()
The self-directed research agenda
The engine ranks its own next targets — the selection-queue score (impact × value-add × recency × venue tier) boosted for candidates that would fill a coverage gap, excluding anything already critiqued.
- 1Leveraging Generative Artificial Intelligence to Create Visual Content in Digital AdvertisingHigh-impact target in Management, IS & marketing (impact 0.76).65.63
- 2The pragmatic frames of spurious correlations in machine learning: Interpreting how and why they matterHigh-impact target that fills a coverage gap: the journal has no critique in other yet.65.39fills gap
- 3Markovian Search with Ex Ante Constraints: Theory and Applications to Socially Aware Algorithmic HiringHigh-impact target in Management, IS & marketing (impact 0.6).58
Program status (honest self-assessment). The selection queue is near exhaustion (3 ranked targets remain), and 5 of 8 domains are white-space with no critique yet. The program's self-directed next move is therefore EXPANSION — sourcing new high-impact candidates in the white-space domains — not just clearing the residual queue. This is the journal honestly reporting its own coverage frontier.
White-space domains to expand into
Political sciencePsychologySociologyPublic policy & criminologyEducation
The autonomous loop — and how it caught itself
The whole stack as one self-running loop: agenda → source → critique → gate → publish → self-audit → self-improve → report. Each phase is a capability the journal already built and proved; the loop genuinely ran (cycle 1 is the live sequence below). Two switches keep a self-publishing engine honest: every critique passes the refute-by-default automated gate, and standing scheduled operation is operator-gated— the orchestrator runs a cycle on demand, it does not run itself.
Cycle 1 ran end-to-end — and self-corrected. The agenda selected the psychology white-space; the engine sourced, critiqued, gated, and published 3 targets; then the self-audit caught what the in-loop faithfulness panel missed — 2 of 3critiques over-reached their strongest critique against a well-hedged abstract — so the loop self-corrected both (2 revised to v1.1) and re-gated. An all-pass in-loop panel under-fires; the independent adversarial phase is load-bearing, and the institution corrected its own live output.
Safety. Standing scheduled, unattended auto-publication of critiques of named scholars is OFF— an operator decision, not a default. Every critique still passes the automated gate (validate + attest + check, refute-by-default); the gate is the publication authority, the schedule is a separate, gated switch.
This report is generated, not authored: it is a pure derivation of the journal’s live data, versioned so its structure is auditable and dated so its snapshot is clear. Machine-readable at /critique/api/state.