Post-publication Comment · Critical AI
Comment on “When Influencers Delegate Replies: How Social AI Agents Shape User Engagement”
Critical AI · published 2026-06-20 · v1.0 · CRIT-GEN-when-influencers-delegat
Concerning: Maggie Mengqing Zhang, Yang Gao, Jingjing Li, Steven L. Johnson · Information Systems Research · 2026-05-08
Why this paper was selected
Critique generated in-session via produce-and-publish (with faithfulness self-check), grounded in the verified OpenAlex abstract of the ISR staggered-DiD study on social AI agents + influencer engagement. Severity capped moderate; claims-not-motives; fabricatedCitations=0.
AI/AGI centrality 3/5 · societal relevance 4/5 · source-journal note: Tier A per the determination; ingested from an AGISS critique artifact.
Summary
This study asks whether letting an AI agent reply to fans on an influencer's behalf helps or hurts engagement. It uses a credible design — a real platform feature rollout analyzed with a staggered difference-in-differences method — and finds that users who got an AI reply comment and repost more on later posts, especially when the replies are relevant, on-brand, and timely. The design is a genuine quasi-experiment, not just a correlation, which is a real strength. The main caveats: who actually receives an AI reply is probably not random (the delegated task is replying to comments, so receipt likely follows from commenting, and agents may target engaged users — though the abstract does not spell out the trigger), the abstract shows no pre-trend evidence and names no bias-robust estimator, the 'social presence' mechanism is inferred from reply traits rather than measured, engagement is used as a stand-in for 'relationship quality' that is never directly assessed, and everything comes from a single platform. The conclusions are appropriately hedged, so this reads as a solid but bounded contribution rather than an overclaim.
Central claims & evidence map
| Claim | Type | Evidence offered | Support | Overclaiming | Main weakness |
|---|---|---|---|---|---|
| Receiving an AI reply significantly increases users' commenting on the influencer's subsequent posts, identified via a staggered difference-in-differences design comparing users who received an AI reply against those who did not, leveraging the rollout of the AI agent feature on a major platform. | Causal | A staggered DiD design exploiting the platform rollout, with a treated/control contrast at the user level: "compare engagement behaviors between users who received an AI reply (i.e., a reply from an influencer's social AI agent) and those who did not." | Moderate | Minor | On the reading that reply receipt is conditional on the user's own commenting and/or targeted to engaged users, selection into the treatment ('receiving a reply') is distinct from, and not resolved by, the exogenous rollout timing the abstract leads with; the abstract does not specify the trigger mechanism either way. |
| The positive engagement effect is concentrated when AI replies amplify the influencer's 'social presence,' operationalized through content relevance, stylistic alignment, and reply timeliness; on these dimensions weaker replies show an attenuated boost. | A moderation pattern: the effect is larger "particularly when AI replies amplify an influencer's social presence, as reflected in content relevance, stylistic alignment, and reply timeliness." | Weak | Moderate | The moderators are post-treatment, non-exogenous reply attributes, so the pattern is consistent with selection rather than a clean social-presence channel. | |
| The effect is heterogeneous across relationship characteristics: engagement gains are stronger among loyal followers and weaker for commercialized influencers and those in the technology domain. | Subgroup contrasts reported in the abstract: "engagement gains are stronger among loyal followers but weaker for commercialized influencers and those in the technology domain." | Moderate | Minor | The loyalty result is observationally equivalent under both a relationship-theory reading and a treatment-selection reading; subgroup definitions and multiplicity controls are unstated. | |
| Reply scarcity amplifies the effect: engagement increases more when an influencer rarely replied previously, or when fewer AI replies appear under the focal post. | "reply scarcity amplifies the effect: engagement increases more when influencers rarely replied previously or when fewer AI replies appear under the focal post." | Weak | Minor | Scarcity amplification is consistent with low-baseline mechanics and novelty/salience, not uniquely with the social-presence mechanism. | |
| The engagement boost generalizes across post types, extending to both sponsored and nonsponsored posts. | "The engagement boost extends to both sponsored and nonsponsored posts." | Moderate | Minor | Establishes within-platform robustness across post types but not transportability or comparable effect magnitude. | |
| The treatment effect extends beyond commenting to user reposting behavior, indicating a broader engagement response rather than a single-metric effect. | "The engagement boost extends to ... as well as user reposting behavior." | Moderate | Minor | Broadens the outcome set but still measures behavior, not the relationship quality the paper frames. | |
| After adopting AI agents, influencers themselves post more frequently, suggesting a supply-side behavioral change accompanying delegation. | Causal | "influencers themselves also post more frequently after adopting AI agents." | Weak | Moderate | Adoption is self-selected (different identification than the reply-level DiD), and increased posting is a candidate post-treatment confound for the engagement outcome. |
| Delegating social interaction tasks to a social AI agent need not weaken the influencer-user relationship and can enhance user engagement, conditional on the agent functioning as an effective social delegate, contrary to the concern that automation degrades relationship quality. | Normative | The framed prior concern that automation "may weaken the influencer-user relationship if the agents fail to serve as effective social delegates," answered by the conditional thesis that delegation can "enhance user engagement." | Weak | Moderate | Engagement is substituted for the relationship construct, and, on the reading that 'effective social delegate' is operationalized via the same moderators, the 'conditional on effectiveness' framing risks circularity; the degradation case is not demonstrated. |
| The paper contributes to the literatures on AI delegation and influencer engagement by specifying when and how delegating social relationship management to social AI agents enhances user engagement. | Stated contribution: "highlighting when and how delegating social relationship management to social AI agents can enhance user engagement." | Moderate | Minor | The mechanism ('how') is inferred not measured, and the contribution is framed for 'social AI agents' broadly while the abstract reports evidence from a single named platform. | |
| Social presence is advanced as the mechanism through which AI replies drive engagement, inferred from the pattern that relevance, style, and timeliness moderate the effect. | Theoretical | Mechanism inferred from a moderation pattern: the effect concentrates "when AI replies amplify an influencer's social presence, as reflected in content relevance, stylistic alignment, and reply timeliness." | Weak | Moderate | The named mechanism is labeled and inferred from endogenous proxies, not directly measured or discriminated from rival channels such as reciprocity and novelty (and, on the critic's reading, algorithmic amplification, which the abstract does not itself invoke). |
Per-claim assessment
c1. Receiving an AI reply significantly increases users' commenting on the influencer's subsequent posts, identified via a staggered difference-in-differences design comparing users who received an AI reply against those who did not, leveraging the rollout of the AI agent feature on a major platform.
The rollout is a credible, real source of exogenous variation, and DiD differences out time-invariant user-level engagement propensity, so the design is genuinely more than a correlation. The central limitation, on the critic's reading, is a conflation of two levels of variation: the rollout makes the feature available (plausibly exogenous), but who within a treated audience actually 'received an AI reply' is plausibly endogenous. The abstract states only that influencers delegate 'replying to comments'; that reply receipt is therefore conditional on the user's own commenting, or targeted to engaged users, is an inference, not a stated property. The abstract does not state whether the estimand is the influencer-adoption cohort or the individual reply event, nor how any user-level selection into the reply is purged.
c2. The positive engagement effect is concentrated when AI replies amplify the influencer's 'social presence,' operationalized through content relevance, stylistic alignment, and reply timeliness; on these dimensions weaker replies show an attenuated boost.
The three moderators are properties of the realized reply and are themselves plausibly endogenous to the same influencer-user fit and user engagement that drive the outcome. Conditioning on a post-treatment characteristic of the reply risks a bad-control/collider issue; more 'relevant' or 'stylistically aligned' replies may concentrate precisely where engagement was already likely. The abstract reports the effect is 'attenuated,' not reversed, so the downside case is not demonstrated.
c3. The effect is heterogeneous across relationship characteristics: engagement gains are stronger among loyal followers and weaker for commercialized influencers and those in the technology domain.
The pattern is theoretically coherent (delegation helps where a genuine bond is plausible, less where interactions read as transactional), which is somewhat reassuring against a generic confound. However, 'loyal followers' is plausibly the subgroup whose reply-receipt and subsequent engagement are most confounded with latent engagement propensity, so this cut also coheres with a selection story. Construct definitions (loyalty, commercialization, domain) are invisible at abstract level and the abstract does not mention multiple-comparison correction or pre-specification across the many cuts.
c4. Reply scarcity amplifies the effect: engagement increases more when an influencer rarely replied previously, or when fewer AI replies appear under the focal post.
The authors argue scarcity disciplines a content-free 'any reply' account. But the pattern is also consistent with mechanical/baseline explanations (users of rarely-replying influencers may have low baseline engagement, yielding larger proportional gains) and with a salience/novelty channel rather than social presence specifically. Without baseline rates or a stated estimand, scarcity does not by itself adjudicate between social presence and novelty.
c5. The engagement boost generalizes across post types, extending to both sponsored and nonsponsored posts.
Robustness across content categories is a legitimate internal-generalization claim and reduces the chance the effect is an artifact of one post type. It does not, however, address the cross-platform external-validity gap, and the abstract reports no magnitudes to judge whether the effect is comparable in size across types or merely present in both.
c6. The treatment effect extends beyond commenting to user reposting behavior, indicating a broader engagement response rather than a single-metric effect.
Two distinct engagement behaviors moving together makes a comment-UI-specific or single-metric artifact less likely, which strengthens the 'engagement response' interpretation. It does not, however, bridge the gap to the relationship-quality construct the framing invokes; reposting can be transactional or incentivized and remains a behavioral, not relational, measure.
c7. After adopting AI agents, influencers themselves post more frequently, suggesting a supply-side behavioral change accompanying delegation.
This is a distinct design from the user-level DiD: influencers self-select into adoption, so post-adoption posting changes can confound with whatever motivated adoption. It also raises a post-treatment concern for the main result — if subsequent posts are more numerous and different in mix, then engagement measured 'on subsequent posts' is partly measured on a treated quantity. The abstract presents this finding alongside the user-level effect without distinguishing its weaker identification.
c8. Delegating social interaction tasks to a social AI agent need not weaken the influencer-user relationship and can enhance user engagement, conditional on the agent functioning as an effective social delegate, contrary to the concern that automation degrades relationship quality.
The conclusion is appropriately hedged ('need not,' 'can,' 'conditional on'), which is the right altitude. But it answers a relationship-quality question with engagement outcomes, and the conditioning variable ('effective social delegate') is, on the critic's reading, defined by the same post hoc moderators (relevance/style/timeliness), risking circularity: the effect is positive where the agent was 'effective,' and effectiveness is inferred from where the effect was positive. The framed downside (relationship degradation) is left largely untested since the abstract reports attenuation, not reversal or harm.
c9. The paper contributes to the literatures on AI delegation and influencer engagement by specifying when and how delegating social relationship management to social AI agents enhances user engagement.
The 'when and how' framing is the appropriate, modest altitude for a single-platform quasi-experiment and pre-empts a blanket-benefit overclaim. The 'how' (mechanism) rests on inferred social presence rather than direct measurement. The contribution is stated for 'social AI agents' broadly while the evidence is, on the abstract's own terms, 'a major social media platform' (one platform), so the scope of the 'when and how' generalization is somewhat broader than the single setting warrants. (The agent-implementation and rollout count are inferable but not enumerated in the abstract.)
c10. Social presence is advanced as the mechanism through which AI replies drive engagement, inferred from the pattern that relevance, style, and timeliness moderate the effect.
Social presence is never directly measured in the abstract (no user perceptions, survey, or validated construct reported); it is inferred from three reply attributes moving the effect. A moderation pattern is consistent with social presence but does not identify it against rival channels the critic can raise — reciprocity/obligation toward a personalized reply, or novelty/salience of being noticed. A further rival, algorithmic amplification of threads containing replies, is a candidate the critic raises rather than a mechanism the abstract describes (the abstract does not mention a recommender or feed algorithm). The construct-validity gap between 'three attributes moderate the effect' and 'social presence is the mechanism' is the key concern, compounded because the moderators may be markers of reply quality/effort rather than a latent social-presence construct.
Scorecard
Sub-scores are 0–5 editorial judgements on fixed scales (higher is better, except methodological risk and overclaiming where higher is worse). They are contestable and open to a severity challenge from authors.
Design and identification
The paper's principal strength is a genuine quasi-experiment rather than a correlational story. By "Leveraging the rollout of a social AI agent feature on a major social media platform" and using a "staggered difference-in-differences design," the authors exploit a real, differentially-timed feature deployment, so the headline estimate is not hostage to a single platform-wide before/after break. A blanket 'no causal identification' complaint would be unwarranted. The deeper question is the level of variation. The abstract motivates identification with the rollout (supply-side availability) but defines treatment as the individual event of having "received an AI reply (i.e., a reply from an influencer's social AI agent)." On the critic's reading these are not the same identifying variation: the rollout makes the agent available, while reply receipt may be the product of influencer/agent targeting and the user's own commenting. Note this is an inference — the abstract states only that influencers delegate 'replying to comments,' and does not say a comment is required to trigger a reply or that agents target engaged users. If, as seems likely given that the delegated task is replying to comments, reply receipt is conditional on prior commenting, the stated contrast — users "who received an AI reply ... and those who did not" — is exposed to selection into treatment that the rollout timing does not, by itself, resolve. DiD does difference out time-invariant user engagement propensity (an always-more-engaged user is not, on its own, a bias), so the residual threat is the narrower, testable one of a treatment-timing-correlated trend. The abstract should state whether the cohort is the user, the influencer's adoption date, or the reply event, since each implies a different estimand and threat model.
Parallel trends and the staggered estimator
A staggered DiD identifies the ATT only under (conditional) parallel trends, yet the abstract offers no event-study, leads-and-lags plot, no-anticipation check, or statement of common pre-treatment trajectories. If reply receipt is conditional on commenting (a critic inference, not stated), anticipation and reverse causality are live: users may ramp up commenting before a reply, which would be part of what triggers it. Separately, the abstract claims both heterogeneous effects (c3) and dynamic responses on "subsequent" posts (c1) — precisely the conditions under which naive two-way fixed-effects staggered DiD can be biased by 'forbidden comparisons.' The abstract does not name a heterogeneity-robust estimator (Callaway-Sant'Anna, de Chaisemartin-D'Haultfoeuille, Sun-Abraham, or stacked DiD). Whether the ATT is credible turns materially on this, and a referee should require event-time estimates and the estimator used.
Mechanism and constructs
"Social presence" is the named mechanism but is inferred, not measured. It is operationalized through three reply attributes — "content relevance, stylistic alignment, and reply timeliness" — that are themselves post-treatment properties of the realized reply and plausibly endogenous to the same influencer-user fit and user engagement driving the outcome. Conditioning the effect on these moderators (c2, c10) risks a bad-control/collider problem and cannot, from the abstract, be distinguished from reply quality/effort or from rival channels the critic can raise: reciprocity toward a personalized reply, or novelty/salience of being noticed. A further rival — algorithmic amplification of threads containing replies — is a possibility the critic introduces, not a mechanism the abstract mentions; the abstract says nothing about a recommender or ranking algorithm. The dose-response framing is consistent with social presence but does not identify it. Other constructs — 'loyal followers,' 'commercialized influencers,' 'reply scarcity' — are doing heavy theoretical work with no visible operationalization at abstract level.
Outcomes, scope, and the relationship claim
The motivation is whether delegation "may weaken the influencer-user relationship," but every outcome is behavioral engagement — commenting, "user reposting behavior," and influencer posting frequency. That commenting and reposting move together (c6) does guard against a single-metric artifact, and robustness across "both sponsored and nonsponsored posts" (c5) guards against a one-category artifact; these are fair internal-generalization claims. But more comments can reflect curiosity, contestation, or testing whether the reply is a bot, and the abstract offers no measure of sentiment, trust, retention, or unfollows. Concluding that delegation "need not weaken the relationship and can enhance user engagement" (c8) substitutes engagement for the relationship construct, and the 'conditional on the agent functioning as an effective social delegate' framing risks circularity with the post hoc moderators if, as seems likely, effectiveness is read off those same moderators. Evidence is from 'a major social media platform' — a single named platform; the contribution claim for 'social AI agents' broadly (c9) is wider than that setting warrants, especially since AI-reply detectability and disclosure norms differ across platforms. (The single-agent and single-rollout characterization is a reasonable inference but is not separately enumerated in the abstract.)
Supply-side finding and multiplicity
The result that "influencers themselves also post more frequently after adopting AI agents" (c7) is a different design: adoption is self-selected, so it should not carry the same identification credibility as the user-level reply effect, and it doubles as a candidate post-treatment confound — engagement measured on 'subsequent posts' is measured on a quantity that treatment itself changes. Finally, the abstract reports a dense layer of subgroup and outcome contrasts (loyalty, commercialization, domain, two scarcity measures, sponsored/nonsponsored, reposting) with no mention of multiple-comparison correction or pre-specification, so the coherence of the heterogeneity pattern, while suggestive, cannot be assessed for selective reporting from the abstract alone.
Reproducibility and reporting
Appraisal is limited by abstract-only reporting and almost-certainly proprietary single-platform data. The abstract reports directions ("significantly increases," "stronger," "weaker," "amplifies") but no effect sizes, baseline commenting rates, sample sizes, treatment window, number of treated cohorts, control-group construction, or clustering choices (influencer- vs. user-level matters if treatment is assigned through influencers). For a 'when and how' contribution, a reviewer cannot yet judge whether effects are economically meaningful or whether the comparison is a credible counterfactual. The paper should commit to releasing estimating equations, full event-study output, magnitudes with inference details, and a data-availability statement.
Strongest critique
The treatment — "received an AI reply" — is, on the most natural reading of a paper whose delegated task is "replying to comments," plausibly endogenous to the very behavior being measured: if a comment is what prompts the reply, and if agents target already-engaged users, then the exogenous rollout timing the abstract leads with does not by itself identify the user-level treated/control contrast it actually runs. The abstract does not state the trigger mechanism, so this is a structural concern about the stated design rather than a documented flaw. Layered on this, the abstract provides no parallel-trends/event-study evidence and names no heterogeneity-robust staggered-DiD estimator despite explicitly claiming heterogeneous and dynamic effects — exactly the conditions under which two-way fixed-effects DiD is most fragile. The proposed mechanism compounds the problem: 'social presence' is never measured, only inferred from three post-treatment reply attributes (relevance, style, timeliness) that are themselves plausibly endogenous to engagement, leaving the channel hard to distinguish from reciprocity or novelty.
Strongest fair defence
This is a credibly-identified empirical study of a timely question, and it is scoped with appropriate modesty. The variation comes from a real staggered rollout, and DiD differences out fixed user-level engagement propensity, so the threat is the narrower, testable one of a timing-correlated trend rather than 'engaged users engage more.' The authors do not claim randomized replies, a directly measured psychological construct, or universal external validity; every headline claim is hedged ('particularly when,' 'need not,' 'can,' 'conditional on'). Two distinct outcomes (commenting and reposting) and robustness across sponsored and nonsponsored posts argue against a single-metric or single-category artifact, and the theoretically-patterned heterogeneity (stronger for loyal followers, weaker where interactions are transactional) is hard to reproduce from a generic confound. As a single-platform 'when and how' contribution, the altitude of the claims matches the evidence.
Conclusion
A credibly-identified empirical contribution that engages a timely question with a real rollout and a defensible staggered-DiD design, and that scopes its claims with commendable hedging. Its weaknesses are concentrated and addressable rather than fatal: on the reading that reply receipt is conditional on a user's own commenting (the abstract does not specify the trigger), selection into who 'receives an AI reply' is distinct from the exogenous rollout; the abstract shows no pre-trend evidence and names no heterogeneity-robust estimator; the social-presence mechanism is labeled and inferred rather than measured; engagement stands in for an unmeasured relationship construct; and reach is a single named platform. None impugn the design's core legitimacy, but together they bound what the abstract can presently support. Severity: moderate.
Reply from the authors
Following the practice of Nature Matters Arising, Science Technical Comments and PNAS Letters, this Comment is published as one half of a Comment + Reply pair: the authors of the original article are invited to respond, and any reply is published here verbatim alongside the Comment as part of the record.
Reply: not yet invited. No reply has been received for publication.
The authors have a right of reply and no veto. A reply may request a factual correction, a methodological rebuttal, a clarification, a data/code update, or a severity challenge, and is published unedited. See the right-of-reply policy.
References
Every external source this Comment cites, each with a verified link. 0 fabricated.
Source-grounding attestation
- ✓Verbatim source spans present in the critique — 8/8 provenance spans re-derived in the critique prose
- ✓Passes the publication validator — no errors
- ✓Zero fabricated citations — 0 fabricated
- ✓Severity within the access-basis cap — severity "moderate" ≤ cap "moderate" for abstract_only
Every verbatim span the critique relies on is re-derived in the prose in-app; span-in-source is re-verifiable offline (the abstract is re-fetched, not stored, per the no-reproduce policy).
Re-verify span-in-source offline: python3 scripts/verify-queue-critiques.py
Independent faithfulness review
A refute-by-default adversarial panel (two independent reviewers — an overreach lens and a mischaracterization lens — that fetched the real source) tried to prove this critique misread the paper. This is an AI adversarial review recorded with its reasoning, not a deterministic check.
Both refuters — the overreach reviewer and the mischaracterisation reviewer — returned clean reports, each with an empty list of sustained misreadings and a "faithful" verdict. Neither flagged an over-reach, an unlabelled inference presented as a finding, nor a misstatement of any result in the abstract. With zero sustained objections from either lane, the critique is adjudicated faithful: no refuter identified a real misreading that would warrant a contested or unfaithful verdict. One transparency caveat for the reader: the verified-abstract artifact referenced for this audit could not be located in the working directory at review time, so this adjudication relies on the two refuter reports rather than a fresh independent re-check against the primary text. Given both independent reviewers converged on "faithful" with no defects to disclose, the verdict stands, but it should be read as endorsing the refuters' concurrence rather than as a re-verification from source.
Version & correction history
| Version | Date | Change |
|---|---|---|
| v1.0 | 2026-06-20 |
No silent substantive corrections — every change is versioned and visible.
How to cite this Comment
Critical AI. Comment on “When Influencers Delegate Replies: How Social AI Agents Shape User Engagement” (Maggie Mengqing Zhang et al., Information Systems Research, 2026). Critical AI; 2026. https://policywindow.org/critique/c/when-influencers-delegate-replies-how-social-ai-ag
A registered DOI will replace the URL once minted; until then the canonical URL is the persistent identifier. Highwire/Dublin-Core citation tags and a schema.org Review record are embedded in this page for Google Scholar and reference managers.
Verify this Comment. Its checkable facts (target DOI, access-basis severity cap, zero fabricated citations) are served — as the app’s self-report — at /critique/api/critiques/when-influencers-delegate-replies-how-social-ai-ag/verify; to confirm them independently of this site, re-derive the same checks (and resolve the target DOI) with npx tsx scripts/verify-critical-ai.ts --critique when-influencers-delegate-replies-how-social-ai-ag --live.
Content fingerprint 9f25adc41df76f10 (v1.0) — this Comment’s substantive content is content-addressed; a silent post-publication edit would change it.