{"$schema":"https://policywindow.org/critique/api/schema","critique_id":"CRIT-000003","slug":"the-cybernetic-teammate-a-field-experiment-on-gene","url":"https://policywindow.org/critique/c/the-cybernetic-teammate-a-field-experiment-on-gene","doi":null,"status":"published","critique_type":"editorially_approved_ai_native_critique","publication_date":"2026-06-15","current_version":"1.0","target_paper":{"title":"The Cybernetic Teammate: A Field Experiment on Generative AI and Teamwork","authors":["Fabrizio Dell’Acqua","Charles Ayoubi","Hila Lifshitz‐Assaf","Raffaella Sadun","Ethan Mollick","Lilach Mollick","Yi Han","Jeff Goldman"],"journal":"Organization Science","doi":"10.1287/orsc.2025.20702","url":"https://doi.org/10.1287/orsc.2025.20702","publicationDate":"2026-06-12","paperType":"empirical","accessBasis":"abstract_only","fullTextUsed":false,"fictional":false,"doi_url":"https://doi.org/10.1287/orsc.2025.20702"},"source_journal":{"tier":"S","rankingSources":["https://doi.org/10.1287/orsc.2025.20702","https://openalex.org/W7164527973"],"rankingNote":"Organization Science (INFORMS) is a top-tier, FT50 management and organisation-theory journal. Tier S."},"selection":{"aiAgiCentralityScore":5,"societalRelevanceScore":5,"aiAgiCategories":["labour_markets","human_AI_interaction","innovation_productivity_competition"],"selectionReason":"A widely-discussed preregistered field experiment on generative AI and teamwork in a major firm; its results are already cited in debates about AI replacing collaboration, making the generalisation step worth scrutinising."},"scores":{"aiAgiContribution":5,"evidentiarySupport":4,"methodologicalRisk":2,"overclaiming":2,"reproducibilityOrAuditability":2,"societalImpactRelevance":5,"severity":"low","confidence":"medium"},"severity_cap_for_access_basis":"moderate","plain_language_summary":"This paper runs a randomised field experiment inside one large company to ask what generative AI does to teamwork. Roughly 791 professionals at Procter & Gamble worked on real product-development problems, randomly assigned to work with or without an AI tool and alone or in pairs. The headline is striking: a person working with AI did about as well as a two-person team working without it, AI nudged specialists toward more balanced (less siloed) proposals, and people reported feeling better about the work. The design is a genuine strength — random assignment in a real workplace is hard to get. Our caution is narrower and is about how far the result is stretched. The study is one firm, one kind of task (consumer-goods product innovation), and the emotional findings are self-reported, yet the paper reaches a broad conclusion about 'knowledge work' in general. Read on the abstract alone, the experiment supports a strong claim about this setting and a more tentative one about knowledge work everywhere.","claims":[{"id":"C1","text":"Working with AI let a single professional match the output of a two-person team working without AI.","type":"causal","evidenceOffered":"A preregistered field experiment with random assignment; the abstract states \"individuals with AI matched the performance of teams without AI\".","support":"moderate","overclaiming":"minor","assessment":"For this firm and task the claim is credibly identified by random assignment — a real strength. On the abstract alone the size and durability of the effect cannot be independently judged, but the design supports a causal reading within the studied setting.","mainWeakness":"The equivalence is specific to one firm and one task family; the abstract gives no basis to extend 'one person with AI = a team' to other kinds of knowledge work.","confidence":"medium"},{"id":"C2","text":"The paper generalises from a single-firm, single-task experiment to knowledge work in general.","type":"descriptive","evidenceOffered":"The abstract moves from the P&G setting to: \"More generally, our results suggest that AI adoption in knowledge work affects not only performance but also how expertise and sociality appear within teams\".","support":"weak","overclaiming":"minor","assessment":"This is the critique's main point. The paper hedges the step with 'suggest', so the overclaiming is mild — but the move from one consumer-goods firm's product-innovation tasks to 'knowledge work' is still an external-validity gesture the single-site design does not substantiate; routine, regulated, or adversarial knowledge tasks may behave differently.","mainWeakness":"Single-site, single-task-family design cannot license a population-level claim about knowledge work; replication across firms and task types is needed.","confidence":"medium"},{"id":"C3","text":"AI partly fills the social/motivational role of a human teammate.","type":"descriptive","evidenceOffered":"The supporting evidence is attitudinal: the abstract reports \"more positive self-reported emotional responses among participants\".","support":"weak","overclaiming":"minor","assessment":"A plausible and interesting finding, but self-reported emotion measured shortly after a novel tool is introduced is susceptible to novelty and demand effects; it is weaker evidence than the performance result and should be read as suggestive.","mainWeakness":"Self-report cannot distinguish a durable motivational substitute from short-run novelty enthusiasm.","confidence":"medium"}],"sections":[{"id":"what","title":"What the paper does","body":"A preregistered field experiment randomly assigns ~791 Procter & Gamble professionals to work with or without a generative-AI tool, and alone or in pairs, on real product-development tasks. Random assignment in a live workplace is a genuine methodological strength."},{"id":"generalisation","title":"Where the claim outruns the design","body":"The results are reported for one firm and one task family, but the abstract concludes 'more generally' about 'knowledge work'. That is an external-validity step the single-site design does not support: the balance of generation versus evaluation, and of individual versus team value, may differ sharply in regulated, routine, or adversarial knowledge tasks."},{"id":"measures","title":"Self-reported social effects","body":"The social and motivational claim rests on self-reported emotional responses. Such measures are useful but vulnerable to novelty and demand effects when a salient new tool is introduced, so the 'AI as social teammate' reading is weaker than the performance result."}],"strongest_critique":"The paper's broad framing — that AI reshapes 'knowledge work' and can stand in for a teammate — is carried by a single-firm, single-task experiment plus self-reported emotion, so the most quotable conclusions are the least supported by the design's actual scope.","strongest_fair_defence":"The core performance comparison is established by random assignment in a real workplace, which is exactly the design needed for a credible causal claim; the authors restrict the strong evidence to performance and frame the broader organisational implications as suggestions, not proofs.","final_judgment":"A well-designed field experiment whose internal result is credibly identified for its setting; the principal caution, visible from the abstract alone, is the generalisation from one firm and one task family to knowledge work in general, plus reliance on self-report for the social claim. Severity low; the design is sound and the over-reach is in framing, not method.","review_process":{"aiAgentsUsed":["claim_extraction","ai_agi_relevance","overclaiming","adversarial","author_defence","citation_integrity","legal_risk","plain_language","meta_review"],"reviewRounds":1,"humanEditor":{"name":"Founding editorial review (Policy Window)","role":"Editor-in-chief (founding)","approvalDate":"2026-06-15","declaredConflict":"none"},"expertCertification":{"used":false}},"author_response":{"notified":false,"status":"not_yet_invited","editorialActionAfterResponse":"Founding pilot: authors will be invited to reply once the standing board is ratified; this critique addresses claims, framing and generalisation only, never the authors."},"versions":[{"version":"1.0","date":"2026-06-15","note":"Initial publication.","changeType":"initial"}],"transparency":{"modelCardUrl":"/critique/model-card","publicAuditSummary":"Abstract-only critique: the target's abstract was reconstructed from the OpenAlex record and every verbatim span the critique relies on was checked to be an exact substring of it. The bibliographic record (DOI) was independently confirmed via Crossref. Severity is capped to the abstract-only access basis; the critique engages the paper's framing and stated claims only, not internal validity that the full text would be needed to assess.","privateAuditRecordExists":true,"citationVerification":{"status":"complete","checkedSources":[{"label":"DOI 10.1287/orsc.2025.20702","url":"https://doi.org/10.1287/orsc.2025.20702","verified":true},{"label":"OpenAlex work record (abstract source)","url":"https://openalex.org/W7164527973","verified":true}],"fabricatedCitations":0},"riskReview":{"copyright":"completed","defamation":"completed","note":"Abstract quoted sparingly under criticism/review. Critique targets the paper's claims, framing and generalisation only — never the authors."}}}