Post-publication Comment · Critical AI
Comment on “From prompt engineering to prompt design: Research strategies for visual generative AI”
Critical AI · published 2026-06-21 · v1.0 · CRIT-GEN-from-prompt-engineering-
Concerning: Gabriele Colombo, Sabine Niederer, Carlo De Gaetano · Big Data & Society · 2026-05-19
Why this paper was selected
Selected via the production queue; critique generated by the AGISS engine.
AI/AGI centrality 2/5 · societal relevance 3/5 · source-journal note: Tier exception per the determination; ingested from an AGISS critique artifact.
Summary
This conceptual \"demo\" proposes \"prompt design\" as a way to study image-generating AI for social and cultural research, contrasting it with \"prompt engineering\" aimed at polished outputs. It offers five prompting strategies (ambiguous, comparative, evocative, provocative, reverse-engineered) and illustrates them with biodiversity-themed experiments across models and over 2023-2024, reporting idealized imagery and durable model \"house styles.\" It also tries using LLMs as supervised research assistants to analyze images. As a methods-proposal it is clear and appropriately hedged in places, but a few strong words do more work than the abstract supports: \"reveal\" overstates what a single-concept case shows, persistence over time rests on a short two-year window, and \"reshapes generative AI systems\" claims intervention the described activity (prompting and interpreting) does not demonstrate. Using a generative model to read generative images also raises an unaddressed shared-bias circularity concern.
Central claims & evidence map
| Claim | Type | Evidence offered | Support | Overclaiming | Main weakness |
|---|---|---|---|---|---|
| The piece introduces prompt design as a research approach to studying visual generative AI, distinguishing it from prompt engineering practices that focus on producing aesthetically pleasing or technically polished outputs. | "This demo introduces prompt design as a research approach to studying visual generative AI, distinguishing it from prompt engineering practices that focus on producing aesthetically pleasing or technically polished outputs." | Moderate | Minor | The contrast rests on a possibly thin characterization of the 'prompt engineering' it distinguishes itself from; the demarcation's force depends on that foil. | |
| Drawing on the query design framework from digital methods, the piece outlines several strategies for cultural and social research, including ambiguous, comparative, evocative, provocative, and reverse-engineered prompting. | "Drawing on the query design framework from digital methods, we outline several strategies for cultural and social research: ambiguous prompting for bias research; comparative prompting...; evocative prompting...; provocative prompting...; and reverse-engineered prompting for machine critique." | Moderate | None | The one-to-one strategy-to-target pairings are asserted; the abstract gives no criterion for why a given prompting mode is the right instrument for its named research aim. | |
| Experiments using biodiversity as a case study reveal recurring patterns in AI-generated imagery, including idealized, aesthetically driven depictions and the persistence of distinct model house styles over time (2023-2024). | "These experiments reveal recurring patterns in AI-generated imagery, including idealized, aesthetically driven depictions and the persistence of distinct model “house styles” over time." | Weak | Moderate | A single-concept case (biodiversity) is used to ground patterns stated about AI-generated imagery in general; the scope of the generalization exceeds the case. | |
| The piece explores using large language models as research assistants for analyzing AI-generated images, a process characterized by iterative, supervised collaboration rather than full automation. | "We further explore the use of large language models as research assistants for analyzing AI-generated images, a process characterized by iterative, supervised collaboration rather than full automation." | Moderate | Minor | No evidence in the abstract bears on the reliability of the LLM-assisted analysis, and the shared-model-bias circularity risk in using LLMs to read generative images is unaddressed. | |
| The piece positions prompt design as a critical and interventionist method that not only audits but also engages with and reshapes generative AI systems. | Normative | "we position prompt design as a critical and interventionist method that not only audits but also engages with and reshapes generative AI systems, foregrounding their extractive dynamics..." | Weak | Moderate | 'Reshapes generative AI systems' outruns the abstract's described activity (prompting and interpreting outputs); the interventionist claim is aspirational, not evidenced. |
| Prompt design foregrounds the extractive dynamics of generative AI systems and opens possibilities for more reflexive and participatory forms of AI research. | Normative | "foregrounding their extractive dynamics and opening possibilities for more reflexive and participatory forms of AI research." | Weak | Minor | The 'extractive dynamics' and 'participatory' framing is asserted as a contribution without specifying the mechanism by which prompt design surfaces or enables it beyond existing critical methods. |
| Comparative prompting supports cross-model and cross-term analysis, and the experiments examine representation across models, geographical contexts, and time. | "comparative prompting for cross-model and cross-term analysis... examining how visual generative AI represents this concept across models, geographical contexts, and time (2023–2024)." | Weak | Minor | 'Persistence of house styles over time' is inferred from a two-year window (2023-2024) with unspecified sampling points; the durability claim exceeds the temporal evidence a short window supports. |
Per-claim assessment
c1. The piece introduces prompt design as a research approach to studying visual generative AI, distinguishing it from prompt engineering practices that focus on producing aesthetically pleasing or technically polished outputs.
As a conceptual contribution, the distinction is stated clearly and is plausible within the digital-methods tradition the abstract cites. The contrast is drawn against a somewhat narrow rendering of prompt engineering (reduced to aesthetic/technical polish); on the critic's reading, prompt engineering in practice also includes reliability and reproducibility goals, so the foil may be drawn favourably to sharpen the novelty of 'prompt design.' Still, the abstract presents this as a framing move, which is appropriate for a demo.
c2. Drawing on the query design framework from digital methods, the piece outlines several strategies for cultural and social research, including ambiguous, comparative, evocative, provocative, and reverse-engineered prompting.
This is a taxonomy/strategy enumeration, the core of a methods demo, and is presented as such. The abstract pairs each strategy with a target (bias, cross-model comparison, model logic, content moderation, machine critique). The mapping is asserted rather than validated; on the critic's reading, whether each strategy reliably probes its named target is not demonstrated in the abstract, but a demo abstract is not obligated to validate a typology, only to propose it.
c3. Experiments using biodiversity as a case study reveal recurring patterns in AI-generated imagery, including idealized, aesthetically driven depictions and the persistence of distinct model house styles over time (2023-2024).
The verb 'reveal' is strong for what the abstract describes as illustrative experiments built on a single concept (biodiversity) over a two-year window. The specific design-invited limitation is single-case scope: patterns observed for 'biodiversity' may not generalize to other concepts, so 'recurring patterns in AI-generated imagery' (stated at the level of AI imagery generally) is broader than a biodiversity case can license. On the critic's reading, 'idealized, aesthetically driven depictions' is an interpretive coding of images; the abstract gives no coding protocol or inter-rater basis, so the pattern's robustness is asserted, not shown.
c4. The piece explores using large language models as research assistants for analyzing AI-generated images, a process characterized by iterative, supervised collaboration rather than full automation.
This is candidly hedged: 'explore' and 'rather than full automation' acknowledge the method is supervised and partial, which is appropriate. The characterization 'iterative, supervised collaboration' is, on the critic's reading, a description of practice rather than an evaluated claim about validity or reliability of LLM-assisted image analysis; the abstract offers no accuracy or agreement evidence for the LLM-as-assistant step, which matters because using a generative model to analyze another generative model's outputs invites circularity (the analyst-model may share biases with the generator).
c5. The piece positions prompt design as a critical and interventionist method that not only audits but also engages with and reshapes generative AI systems.
The claim that the method 'reshapes generative AI systems' is the strongest verb in the abstract and is not substantiated by anything the abstract reports; the described activity is prompting closed models and interpreting outputs, which on the critic's reading audits or probes systems without evidence of altering them. 'Position' signals this is a programmatic stance rather than a demonstrated result, which softens it, but 'reshapes' still asserts more than 'audits.' The normative framing ('reflexive and participatory') is presented as an opening of possibilities, appropriately hedged.
c6. Prompt design foregrounds the extractive dynamics of generative AI systems and opens possibilities for more reflexive and participatory forms of AI research.
This is a forward-looking normative/agenda claim, hedged by 'opening possibilities for.' As such it is modest in form. The substantive term 'extractive dynamics' is invoked but the abstract does not specify what is extracted or how prompt design uniquely foregrounds it relative to existing critical-AI methods; on the critic's reading the claim functions as positioning rather than a demonstrated property. Judged by essay/agenda-setting norms, this is acceptable; the hedge prevents it from being a hard overclaim.
c7. Comparative prompting supports cross-model and cross-term analysis, and the experiments examine representation across models, geographical contexts, and time.
The abstract claims comparison across models, geographies, and time but names no models, no number of models, and no geographies, and the temporal window is only two years. On the critic's reading, conclusions about 'persistence... over time' rest on a short, two-point-or-few window, so 'persistence' is a strong word for a 2023-2024 span; non-stationarity in rapidly updated closed models could equally explain stability or change without it being established. This is an illustrative demo, so the thinness is partly genre-appropriate, but the temporal claim is the most exposed.
Scorecard
Sub-scores are 0–5 editorial judgements on fixed scales (higher is better, except methodological risk and overclaiming where higher is worse). They are contestable and open to a severity challenge from authors.
Genre and what to judge
This is explicitly framed as a "demo" that "introduces prompt design as a research approach," drawing on "the query design framework from digital methods." Judged by the norms of interpretive methods-building rather than empirical hypothesis testing, the appropriate standards are conceptual clarity, the coherence of the proposed strategy typology, transparency of case selection, and scope discipline on the claims drawn from illustrative experiments. The abstract is candid in several respects: it calls itself a demo, uses "illustrate," and describes the LLM-assisted step as "iterative, supervised collaboration rather than full automation" rather than a validated pipeline. These hedges are to its credit and lower the critique's severity. The main tension is between this modest framing and a few strong verbs ("reveal," "reshapes") that import an evidential or interventionist register the demo format does not, on its own, support.
Scope of the empirical claims
The experiments "reveal recurring patterns in AI-generated imagery" using "biodiversity as a case study." The specific exposure here is single-case scope: patterns observed for one concept are stated at the level of AI-generated imagery in general. The substantive risk is not a generic confounds list but a concrete generalization gap: "idealized, aesthetically driven depictions" may be a property of how models render an aesthetically charged nature concept like biodiversity, not a property of generative imagery as such. Likewise, "the persistence of distinct model 'house styles' over time" is inferred from a 2023-2024 window with unspecified sampling points; for rapidly updated closed models, a short window makes "persistence" a strong durability claim. The abstract also names comparison "across models, geographical contexts, and time (2023–2024)" without naming models or geographies, so the breadth is asserted rather than scoped.
The LLM-as-analyst circularity risk
The piece explores "the use of large language models as research assistants for analyzing AI-generated images." The candid framing ("iterative, supervised collaboration rather than full automation") is appropriate, but the abstract offers no evidence bearing on the reliability of this analytic step. The design-specific concern is circularity: using a generative model to interpret another generative model's outputs invites shared-bias contamination, where the analyst-model's training priors echo the generator's, making "recurring patterns" partly an artifact of the reading instrument rather than the read object. The abstract does not state inter-rater procedures, agreement with human coding, or how the supervised loop guards against this. This does not invalidate the approach, but it is the load-bearing methodological question the abstract leaves open.
Programmatic verbs vs demonstrated activity
The strongest claim is that prompt design is positioned as a method that "not only audits but also engages with and reshapes generative AI systems." On the critic's reading, the described activity is prompting closed systems and interpreting their outputs, which audits or probes systems but does not obviously alter them; "reshapes generative AI systems" outruns what the abstract reports. The verb "position" signals this is a stance rather than a result, which softens it, but the headline still asserts intervention beyond the evidence. Similarly, "foregrounding their extractive dynamics" invokes a critical-AI vocabulary without specifying what is extracted or how prompt design surfaces it more than existing critical methods. These are agenda-setting moves, legitimate in the genre, but they should be read as aspirations, not findings.
Reproducibility and auditability
Because access is abstract-only, the abstract itself is the unit of assessment, and on its face it lists no models, no prompt corpora, no number of images, no geographies, and no sampling cadence within 2023-2024. For a method whose whole proposition is that prompts are the research instrument, the reproducibility of the demonstrations turns on prompt and model disclosure, which the abstract does not preview. This is partly a genre allowance (a demo previews strategies rather than a full protocol), but it caps how far a reader can independently judge "recurring patterns" or house styles. The strategy typology is, by contrast, the most portable and reusable contribution: the five prompting modes are stated clearly enough to be adopted and tested by others, which is the demo's strongest reproducible output.
Strongest critique
The headline empirical descriptors carry more weight than the abstract's stated design supports. Patterns are said to \"reveal recurring patterns in AI-generated imagery\" but are grounded in a single concept (biodiversity), so the generalization to AI imagery in general appears, on the critic's reading, to exceed the case; and \"the persistence of distinct model 'house styles' over time\" is inferred from a 2023-2024 window with unspecified sampling points, which makes persistence a strong durability claim for a short span of rapidly updated closed models. Compounding this, the analysis partly relies on \"the use of large language models as research assistants for analyzing AI-generated images\" without any abstract-level evidence of reliability, leaving open a circularity risk where the analyst-model's priors may echo the generator's. None of these are fatal for a demo, but they mean the reported patterns are, on the critic's reading, asserted rather than established.
Strongest fair defence
The piece is explicitly a \"demo\" that \"introduces\" a method and illustrates strategies, not an empirical study claiming generalizable findings, so demands for sample sizes, identification, or inter-rater reliability are partly genre-mismatched. Judged as interpretive methods-building in the digital-methods tradition, its core contribution, a clear five-part prompting typology tied to distinct research aims, is well specified and reusable. It is candid where it matters: it frames the LLM-assisted step as \"iterative, supervised collaboration rather than full automation,\" and uses \"position\" and \"opening possibilities for\" to mark its more ambitious claims as agenda-setting rather than proven. The biodiversity case is offered as illustration, not proof, and surfacing house styles and idealized depictions is a plausible, useful prompt for further work.
Conclusion
A clear and largely candid conceptual demo whose principal contribution, a reusable five-part prompting-strategy typology grounded in the query design framework, is genuine and appropriately framed. The critique is moderate and centers on scope discipline rather than method legitimacy: the empirical descriptors lean on strong verbs (\"reveal,\" persistence over time) that a single-concept case (biodiversity) and a short 2023-2024 window do not fully license, and the programmatic claim that the method \"reshapes generative AI systems\" appears, on the critic's reading, to outrun the described activity of prompting and interpreting outputs. The LLM-as-analyst step is honestly hedged but leaves an unaddressed shared-bias circularity risk. Severity is capped at moderate given abstract-only access and the work's self-described demo genre; read as an agenda-setting methods proposal rather than an evidential study, it is a reasonable contribution whose headline claims should be treated as illustrative and aspirational.
Reply from the authors
Following the practice of Nature Matters Arising, Science Technical Comments and PNAS Letters, this Comment is published as one half of a Comment + Reply pair: the authors of the original article are invited to respond, and any reply is published here verbatim alongside the Comment as part of the record.
Reply: not yet invited. No reply has been received for publication.
The authors have a right of reply and no veto. A reply may request a factual correction, a methodological rebuttal, a clarification, a data/code update, or a severity challenge, and is published unedited. See the right-of-reply policy.
Source-grounding attestation
- ✓Verbatim source spans present in the critique — 8/8 provenance spans re-derived in the critique prose
- ✓Passes the publication validator — no errors
- ✓Zero fabricated citations — 0 fabricated
- ✓Severity within the access-basis cap — severity "moderate" ≤ cap "moderate" for abstract_only
Every verbatim span the critique relies on is re-derived in the prose in-app; span-in-source is re-verifiable offline (the abstract is re-fetched, not stored, per the no-reproduce policy).
Re-verify span-in-source offline: python3 scripts/verify-queue-critiques.py
Independent faithfulness review
A refute-by-default adversarial panel (two independent reviewers — an overreach lens and a mischaracterization lens — that fetched the real source) tried to prove this critique misread the paper. This is an AI adversarial review recorded with its reasoning, not a deterministic check.
All seven critique claims track the abstract closely, and every substantive worry is properly hedged with "on the critic's reading," which keeps inferences from being passed off as the abstract's own statements. Spot-check of the load-bearing claims: - c1: Accurately quotes the prompt-design/prompt-engineering distinction. The observation that the prompt-engineering foil is "somewhat narrow" is explicitly flagged as the critic's reading, not attributed to the paper. Faithful. - c2: The five-strategy taxonomy and each strategy-to-target mapping (bias, cross-model, model logic, content moderation, machine critique) are reproduced exactly as the abstract lists them. The "asserted rather than validated" point is correctly hedged and is a fair genre observation. Faithful. - c3: This is the most aggressive claim, but it does not overreach. The abstract genuinely uses "reveal recurring patterns in AI-generated imagery," and the critique's point that a single-concept (biodiversity) case may not license a claim "at the level of AI imagery generally" is a legitimate scope reading, hedged appropriately. The critique does not deny that biodiversity is the stated case or that the experiments are "illustrative" — it acknowledges both. No mischaracterization. - c4: Faithfully quotes "research assistants," "iterative, supervised collaboration rather than full automation," and credits the abstract for candidly hedging with "explore." The circularity concern is clearly marked as the critic's reading. Faithful. - c5: Correctly identifies "reshapes generative AI systems" as the abstract's wording and notes "position" softens it. The claim that the described activity is "prompting closed models and interpreting outputs" is a reasonable reconstruction, hedged as the critic's reading. Note the abstract does say the method "engages with and reshapes" systems and frames it as "interventionist," so calling "reshapes" the strongest verb is fair, not a strengthening of the paper. Faithful. - c6: Accurately quotes "foregrounding their extractive dynamics" and "opening possibilities for more reflexive and participatory forms," and credits the "opening possibilities for" hedge. Faithful. - c7: Correctly notes the abstract specifies comparison "across models, geographical contexts, and time (2023-2024)" without naming models or geographies. The "persistence... over time" durability concern over a short window is a fair, hedged reading. Faithful. The strongest-critique and final-judgment summaries also stay within bounds: they repeatedly cap severity at moderate, credit the typology as a genuine contribution, and frame the empirical descriptors as illustrative/aspirational consistent with the demo genre. No claim treats an inference as the abstract's own assertion, and none strengthens or narrows the paper's claims. The critique is disciplined and faithful throughout.
Version & correction history
| Version | Date | Change |
|---|---|---|
| v1.0 | 2026-06-21 |
No silent substantive corrections — every change is versioned and visible.
How to cite this Comment
Critical AI. Comment on “From prompt engineering to prompt design: Research strategies for visual generative AI” (Gabriele Colombo et al., Big Data & Society, 2026). Critical AI; 2026. https://policywindow.org/critique/c/from-prompt-engineering-to-prompt-design-research
A registered DOI will replace the URL once minted; until then the canonical URL is the persistent identifier. Highwire/Dublin-Core citation tags and a schema.org Review record are embedded in this page for Google Scholar and reference managers.
Verify this Comment. Its checkable facts (target DOI, access-basis severity cap, zero fabricated citations) are served — as the app’s self-report — at /critique/api/critiques/from-prompt-engineering-to-prompt-design-research/verify; to confirm them independently of this site, re-derive the same checks (and resolve the target DOI) with npx tsx scripts/verify-critical-ai.ts --critique from-prompt-engineering-to-prompt-design-research --live.
Content fingerprint 58183f6682cc22c3 (v1.0) — this Comment’s substantive content is content-addressed; a silent post-publication edit would change it.