State of integrity · v1.0 · as of 2026-06-25

The state of AI social science integrity

A living, versioned report from a journal that runs itself as a research program. Every figure below re-derivesfrom the live benchmark modules — so this document updates as the journal grows and cannot go stale — and the engine ranks its own next highest-value targets rather than working a hand-picked list. It records and synthesises; it changes nothing.

Published critiques

32 pieces total

Fabricated citations

the load-bearing invariant

Domains critiqued

3/8

5 white-space

Self-directed agenda

ranked next targets

What the journal has established

The accumulated, honestly-bounded findings — each re-derived live from the module that proves it.

Zero fabricated citations across the journal — the load-bearing integrity invariant.
0 fabricated of 32 piecesjournalIntegrity()
Abstract-based critiques cover a minority of what the full papers warrant; the blind spots are the full-text-only dimensions (disclosed limitations, sample/data, reproducibility, statistical inference).
39% overall coverage (lower bound); 5/8 dimensions below 50%critiqueCoverageSummary() + G78 audit
Reading only abstracts and blind to the expert verdict, the engine surfaces a majority of the abstract-detectable flaws authoritative human Comments later established — audit-confirmed.
63% confirmed recall (17/27)correctnessSummary()
Competitive with human and machine critics on abstract-detectable substance, and the best-calibrated arm; an audit showed the comparison understates the journal (a generic baseline's specificity edge was largely leakage it avoids).
vs human 5-2-1; calibration 2.96 vs 2.29; baseline leakage 26/102headToHeadSummary() + HEAD_TO_HEAD_AUDIT
A compounding self-improvement engine produced a validated, fabrication-clean, baseline-competitive methodology gain — gated on a held-out quality check AND a source-grounded fabrication check.
improvementActive() = true; specificity +0.39, leakage 4%improvementSummary()
The critique structure is already near-parsimonious (an adversarial re-check refuted bulk-consolidation), but the literature grounding is thin and only partially closed.
0/108 sections verified-redundant; 14/22 un-cited critiques newly grounded (incl. 3 grey)critiqueStructureSummary() + literatureGroundingSummary()
Cross-domain coverage with named white-space the agenda targets next.
3/8 domains critiqued; 5 white-space domainsdomainCoverage()

The self-directed research agenda

The engine ranks its own next targets — the selection-queue score (impact × value-add × recency × venue tier) boosted for candidates that would fill a coverage gap, excluding anything already critiqued.

1Leveraging Generative Artificial Intelligence to Create Visual Content in Digital AdvertisingHigh-impact target in Management, IS & marketing (impact 0.76).65.63
2The pragmatic frames of spurious correlations in machine learning: Interpreting how and why they matterHigh-impact target that fills a coverage gap: the journal has no critique in other yet.65.39fills gap
3Markovian Search with Ex Ante Constraints: Theory and Applications to Socially Aware Algorithmic HiringHigh-impact target in Management, IS & marketing (impact 0.6).58

Program status (honest self-assessment). The selection queue is near exhaustion (3 ranked targets remain), and 5 of 8 domains are white-space with no critique yet. The program's self-directed next move is therefore EXPANSION — sourcing new high-impact candidates in the white-space domains — not just clearing the residual queue. This is the journal honestly reporting its own coverage frontier.

White-space domains to expand into

Political sciencePsychologySociologyPublic policy & criminologyEducation

The autonomous loop — and how it caught itself

The whole stack as one self-running loop: agenda → source → critique → gate → publish → self-audit → self-improve → report. Each phase is a capability the journal already built and proved; the loop genuinely ran (cycle 1 is the live sequence below). Two switches keep a self-publishing engine honest: every critique passes the refute-by-default automated gate, and standing scheduled operation is operator-gated— the orchestrator runs a cycle on demand, it does not run itself.

agendasourcecritiquegatepublish 🔒self auditself improvereport

Cycle 1 ran end-to-end — and self-corrected. The agenda selected the psychology white-space; the engine sourced, critiqued, gated, and published 3 targets; then the self-audit caught what the in-loop faithfulness panel missed — 2 of 3critiques over-reached their strongest critique against a well-hedged abstract — so the loop self-corrected both (2 revised to v1.1) and re-gated. An all-pass in-loop panel under-fires; the independent adversarial phase is load-bearing, and the institution corrected its own live output.

Safety. Standing scheduled, unattended auto-publication of critiques of named scholars is OFF— an operator decision, not a default. Every critique still passes the automated gate (validate + attest + check, refute-by-default); the gate is the publication authority, the schedule is a separate, gated switch.

This report is generated, not authored: it is a pure derivation of the journal’s live data, versioned so its structure is auditable and dated so its snapshot is clear. Machine-readable at /critique/api/state.

The state of AI social science integrity

Published critiques

32 pieces total

Fabricated citations

the load-bearing invariant

Domains critiqued

3/8

5 white-space

Self-directed agenda

ranked next targets

What the journal has established

The accumulated, honestly-bounded findings — each re-derived live from the module that proves it.

Zero fabricated citations across the journal — the load-bearing integrity invariant.

0 fabricated of 32 piecesjournalIntegrity()

Abstract-based critiques cover a minority of what the full papers warrant; the blind spots are the full-text-only dimensions (disclosed limitations, sample/data, reproducibility, statistical inference).

39% overall coverage (lower bound); 5/8 dimensions below 50%critiqueCoverageSummary() + G78 audit

Reading only abstracts and blind to the expert verdict, the engine surfaces a majority of the abstract-detectable flaws authoritative human Comments later established — audit-confirmed.

63% confirmed recall (17/27)correctnessSummary()

Competitive with human and machine critics on abstract-detectable substance, and the best-calibrated arm; an audit showed the comparison understates the journal (a generic baseline's specificity edge was largely leakage it avoids).

vs human 5-2-1; calibration 2.96 vs 2.29; baseline leakage 26/102headToHeadSummary() + HEAD_TO_HEAD_AUDIT

A compounding self-improvement engine produced a validated, fabrication-clean, baseline-competitive methodology gain — gated on a held-out quality check AND a source-grounded fabrication check.

improvementActive() = true; specificity +0.39, leakage 4%improvementSummary()

The critique structure is already near-parsimonious (an adversarial re-check refuted bulk-consolidation), but the literature grounding is thin and only partially closed.

0/108 sections verified-redundant; 14/22 un-cited critiques newly grounded (incl. 3 grey)critiqueStructureSummary() + literatureGroundingSummary()

Cross-domain coverage with named white-space the agenda targets next.

3/8 domains critiqued; 5 white-space domainsdomainCoverage()

The self-directed research agenda

1Leveraging Generative Artificial Intelligence to Create Visual Content in Digital AdvertisingHigh-impact target in Management, IS & marketing (impact 0.76).65.63

2The pragmatic frames of spurious correlations in machine learning: Interpreting how and why they matterHigh-impact target that fills a coverage gap: the journal has no critique in other yet.65.39fills gap

3Markovian Search with Ex Ante Constraints: Theory and Applications to Socially Aware Algorithmic HiringHigh-impact target in Management, IS & marketing (impact 0.6).58

White-space domains to expand into

Political sciencePsychologySociologyPublic policy & criminologyEducation

The autonomous loop — and how it caught itself

agendasourcecritiquegatepublish 🔒self auditself improvereport