{"$schema":"https://policywindow.org/critique/api/schema","critique_id":"CRIT-GEN-from-prompt-engineering-","slug":"from-prompt-engineering-to-prompt-design-research","url":"https://policywindow.org/critique/c/from-prompt-engineering-to-prompt-design-research","doi":null,"status":"published","critique_type":"editorially_approved_ai_native_critique","publication_date":"2026-06-21","current_version":"1.0","target_paper":{"title":"From prompt engineering to prompt design: Research strategies for visual generative AI","authors":["Gabriele Colombo","Sabine Niederer","Carlo De Gaetano"],"journal":"Big Data & Society","doi":"10.1177/20539517261451462","url":"https://doi.org/10.1177/20539517261451462","publicationDate":"2026-05-19","paperType":"conceptual","accessBasis":"abstract_only","fullTextUsed":false,"fictional":false,"doi_url":"https://doi.org/10.1177/20539517261451462"},"source_journal":{"tier":"exception","rankingSources":["resolved from the monitored-venue determination"],"rankingNote":"Tier exception per the determination; ingested from an AGISS critique artifact."},"selection_provenance":{"id":"from-prompt-engineering-to-prompt-design-research","venue":"Big Data & Society","inMonitoredSet":true,"determinedTier":"exception","recordedTier":"exception","effectiveTier":"exception","kind":"monitored","disclosed":true},"selection":{"aiAgiCentralityScore":2,"societalRelevanceScore":3,"aiAgiCategories":[],"selectionReason":"Selected via the production queue; critique generated by the AGISS engine."},"scores":{"aiAgiContribution":2,"evidentiarySupport":2,"methodologicalRisk":3,"overclaiming":2,"reproducibilityOrAuditability":2,"societalImpactRelevance":3,"severity":"moderate","confidence":"medium"},"severity_cap_for_access_basis":"moderate","plain_language_summary":"This conceptual \\\"demo\\\" proposes \\\"prompt design\\\" as a way to study image-generating AI for social and cultural research, contrasting it with \\\"prompt engineering\\\" aimed at polished outputs. It offers five prompting strategies (ambiguous, comparative, evocative, provocative, reverse-engineered) and illustrates them with biodiversity-themed experiments across models and over 2023-2024, reporting idealized imagery and durable model \\\"house styles.\\\" It also tries using LLMs as supervised research assistants to analyze images. As a methods-proposal it is clear and appropriately hedged in places, but a few strong words do more work than the abstract supports: \\\"reveal\\\" overstates what a single-concept case shows, persistence over time rests on a short two-year window, and \\\"reshapes generative AI systems\\\" claims intervention the described activity (prompting and interpreting) does not demonstrate. Using a generative model to read generative images also raises an unaddressed shared-bias circularity concern.","claims":[{"id":"c1","text":"The piece introduces prompt design as a research approach to studying visual generative AI, distinguishing it from prompt engineering practices that focus on producing aesthetically pleasing or technically polished outputs.","type":"conceptual","evidenceOffered":"\"This demo introduces prompt design as a research approach to studying visual generative AI, distinguishing it from prompt engineering practices that focus on producing aesthetically pleasing or technically polished outputs.\"","support":"moderate","overclaiming":"minor","assessment":"As a conceptual contribution, the distinction is stated clearly and is plausible within the digital-methods tradition the abstract cites. The contrast is drawn against a somewhat narrow rendering of prompt engineering (reduced to aesthetic/technical polish); on the critic's reading, prompt engineering in practice also includes reliability and reproducibility goals, so the foil may be drawn favourably to sharpen the novelty of 'prompt design.' Still, the abstract presents this as a framing move, which is appropriate for a demo.","mainWeakness":"The contrast rests on a possibly thin characterization of the 'prompt engineering' it distinguishes itself from; the demarcation's force depends on that foil.","confidence":"high"},{"id":"c2","text":"Drawing on the query design framework from digital methods, the piece outlines several strategies for cultural and social research, including ambiguous, comparative, evocative, provocative, and reverse-engineered prompting.","type":"conceptual","evidenceOffered":"\"Drawing on the query design framework from digital methods, we outline several strategies for cultural and social research: ambiguous prompting for bias research; comparative prompting...; evocative prompting...; provocative prompting...; and reverse-engineered prompting for machine critique.\"","support":"moderate","overclaiming":"none","assessment":"This is a taxonomy/strategy enumeration, the core of a methods demo, and is presented as such. The abstract pairs each strategy with a target (bias, cross-model comparison, model logic, content moderation, machine critique). The mapping is asserted rather than validated; on the critic's reading, whether each strategy reliably probes its named target is not demonstrated in the abstract, but a demo abstract is not obligated to validate a typology, only to propose it.","mainWeakness":"The one-to-one strategy-to-target pairings are asserted; the abstract gives no criterion for why a given prompting mode is the right instrument for its named research aim.","confidence":"high"},{"id":"c3","text":"Experiments using biodiversity as a case study reveal recurring patterns in AI-generated imagery, including idealized, aesthetically driven depictions and the persistence of distinct model house styles over time (2023-2024).","type":"empirical","evidenceOffered":"\"These experiments reveal recurring patterns in AI-generated imagery, including idealized, aesthetically driven depictions and the persistence of distinct model “house styles” over time.\"","support":"weak","overclaiming":"moderate","assessment":"The verb 'reveal' is strong for what the abstract describes as illustrative experiments built on a single concept (biodiversity) over a two-year window. The specific design-invited limitation is single-case scope: patterns observed for 'biodiversity' may not generalize to other concepts, so 'recurring patterns in AI-generated imagery' (stated at the level of AI imagery generally) is broader than a biodiversity case can license. On the critic's reading, 'idealized, aesthetically driven depictions' is an interpretive coding of images; the abstract gives no coding protocol or inter-rater basis, so the pattern's robustness is asserted, not shown.","mainWeakness":"A single-concept case (biodiversity) is used to ground patterns stated about AI-generated imagery in general; the scope of the generalization exceeds the case.","confidence":"medium"},{"id":"c4","text":"The piece explores using large language models as research assistants for analyzing AI-generated images, a process characterized by iterative, supervised collaboration rather than full automation.","type":"conceptual","evidenceOffered":"\"We further explore the use of large language models as research assistants for analyzing AI-generated images, a process characterized by iterative, supervised collaboration rather than full automation.\"","support":"moderate","overclaiming":"minor","assessment":"This is candidly hedged: 'explore' and 'rather than full automation' acknowledge the method is supervised and partial, which is appropriate. The characterization 'iterative, supervised collaboration' is, on the critic's reading, a description of practice rather than an evaluated claim about validity or reliability of LLM-assisted image analysis; the abstract offers no accuracy or agreement evidence for the LLM-as-assistant step, which matters because using a generative model to analyze another generative model's outputs invites circularity (the analyst-model may share biases with the generator).","mainWeakness":"No evidence in the abstract bears on the reliability of the LLM-assisted analysis, and the shared-model-bias circularity risk in using LLMs to read generative images is unaddressed.","confidence":"medium"},{"id":"c5","text":"The piece positions prompt design as a critical and interventionist method that not only audits but also engages with and reshapes generative AI systems.","type":"normative","evidenceOffered":"\"we position prompt design as a critical and interventionist method that not only audits but also engages with and reshapes generative AI systems, foregrounding their extractive dynamics...\"","support":"weak","overclaiming":"moderate","assessment":"The claim that the method 'reshapes generative AI systems' is the strongest verb in the abstract and is not substantiated by anything the abstract reports; the described activity is prompting closed models and interpreting outputs, which on the critic's reading audits or probes systems without evidence of altering them. 'Position' signals this is a programmatic stance rather than a demonstrated result, which softens it, but 'reshapes' still asserts more than 'audits.' The normative framing ('reflexive and participatory') is presented as an opening of possibilities, appropriately hedged.","mainWeakness":"'Reshapes generative AI systems' outruns the abstract's described activity (prompting and interpreting outputs); the interventionist claim is aspirational, not evidenced.","confidence":"medium"},{"id":"c6","text":"Prompt design foregrounds the extractive dynamics of generative AI systems and opens possibilities for more reflexive and participatory forms of AI research.","type":"normative","evidenceOffered":"\"foregrounding their extractive dynamics and opening possibilities for more reflexive and participatory forms of AI research.\"","support":"weak","overclaiming":"minor","assessment":"This is a forward-looking normative/agenda claim, hedged by 'opening possibilities for.' As such it is modest in form. The substantive term 'extractive dynamics' is invoked but the abstract does not specify what is extracted or how prompt design uniquely foregrounds it relative to existing critical-AI methods; on the critic's reading the claim functions as positioning rather than a demonstrated property. Judged by essay/agenda-setting norms, this is acceptable; the hedge prevents it from being a hard overclaim.","mainWeakness":"The 'extractive dynamics' and 'participatory' framing is asserted as a contribution without specifying the mechanism by which prompt design surfaces or enables it beyond existing critical methods.","confidence":"medium"},{"id":"c7","text":"Comparative prompting supports cross-model and cross-term analysis, and the experiments examine representation across models, geographical contexts, and time.","type":"empirical","evidenceOffered":"\"comparative prompting for cross-model and cross-term analysis... examining how visual generative AI represents this concept across models, geographical contexts, and time (2023–2024).\"","support":"weak","overclaiming":"minor","assessment":"The abstract claims comparison across models, geographies, and time but names no models, no number of models, and no geographies, and the temporal window is only two years. On the critic's reading, conclusions about 'persistence... over time' rest on a short, two-point-or-few window, so 'persistence' is a strong word for a 2023-2024 span; non-stationarity in rapidly updated closed models could equally explain stability or change without it being established. This is an illustrative demo, so the thinness is partly genre-appropriate, but the temporal claim is the most exposed.","mainWeakness":"'Persistence of house styles over time' is inferred from a two-year window (2023-2024) with unspecified sampling points; the durability claim exceeds the temporal evidence a short window supports.","confidence":"medium"}],"sections":[{"id":"s1","title":"Genre and what to judge","body":"This is explicitly framed as a \"demo\" that \"introduces prompt design as a research approach,\" drawing on \"the query design framework from digital methods.\" Judged by the norms of interpretive methods-building rather than empirical hypothesis testing, the appropriate standards are conceptual clarity, the coherence of the proposed strategy typology, transparency of case selection, and scope discipline on the claims drawn from illustrative experiments. The abstract is candid in several respects: it calls itself a demo, uses \"illustrate,\" and describes the LLM-assisted step as \"iterative, supervised collaboration rather than full automation\" rather than a validated pipeline. These hedges are to its credit and lower the critique's severity. The main tension is between this modest framing and a few strong verbs (\"reveal,\" \"reshapes\") that import an evidential or interventionist register the demo format does not, on its own, support."},{"id":"s2","title":"Scope of the empirical claims","body":"The experiments \"reveal recurring patterns in AI-generated imagery\" using \"biodiversity as a case study.\" The specific exposure here is single-case scope: patterns observed for one concept are stated at the level of AI-generated imagery in general. The substantive risk is not a generic confounds list but a concrete generalization gap: \"idealized, aesthetically driven depictions\" may be a property of how models render an aesthetically charged nature concept like biodiversity, not a property of generative imagery as such. Likewise, \"the persistence of distinct model 'house styles' over time\" is inferred from a 2023-2024 window with unspecified sampling points; for rapidly updated closed models, a short window makes \"persistence\" a strong durability claim. The abstract also names comparison \"across models, geographical contexts, and time (2023–2024)\" without naming models or geographies, so the breadth is asserted rather than scoped."},{"id":"s3","title":"The LLM-as-analyst circularity risk","body":"The piece explores \"the use of large language models as research assistants for analyzing AI-generated images.\" The candid framing (\"iterative, supervised collaboration rather than full automation\") is appropriate, but the abstract offers no evidence bearing on the reliability of this analytic step. The design-specific concern is circularity: using a generative model to interpret another generative model's outputs invites shared-bias contamination, where the analyst-model's training priors echo the generator's, making \"recurring patterns\" partly an artifact of the reading instrument rather than the read object. The abstract does not state inter-rater procedures, agreement with human coding, or how the supervised loop guards against this. This does not invalidate the approach, but it is the load-bearing methodological question the abstract leaves open."},{"id":"s4","title":"Programmatic verbs vs demonstrated activity","body":"The strongest claim is that prompt design is positioned as a method that \"not only audits but also engages with and reshapes generative AI systems.\" On the critic's reading, the described activity is prompting closed systems and interpreting their outputs, which audits or probes systems but does not obviously alter them; \"reshapes generative AI systems\" outruns what the abstract reports. The verb \"position\" signals this is a stance rather than a result, which softens it, but the headline still asserts intervention beyond the evidence. Similarly, \"foregrounding their extractive dynamics\" invokes a critical-AI vocabulary without specifying what is extracted or how prompt design surfaces it more than existing critical methods. These are agenda-setting moves, legitimate in the genre, but they should be read as aspirations, not findings."},{"id":"s5","title":"Reproducibility and auditability","body":"Because access is abstract-only, the abstract itself is the unit of assessment, and on its face it lists no models, no prompt corpora, no number of images, no geographies, and no sampling cadence within 2023-2024. For a method whose whole proposition is that prompts are the research instrument, the reproducibility of the demonstrations turns on prompt and model disclosure, which the abstract does not preview. This is partly a genre allowance (a demo previews strategies rather than a full protocol), but it caps how far a reader can independently judge \"recurring patterns\" or house styles. The strategy typology is, by contrast, the most portable and reusable contribution: the five prompting modes are stated clearly enough to be adopted and tested by others, which is the demo's strongest reproducible output."}],"strongest_critique":"The headline empirical descriptors carry more weight than the abstract's stated design supports. Patterns are said to \\\"reveal recurring patterns in AI-generated imagery\\\" but are grounded in a single concept (biodiversity), so the generalization to AI imagery in general appears, on the critic's reading, to exceed the case; and \\\"the persistence of distinct model 'house styles' over time\\\" is inferred from a 2023-2024 window with unspecified sampling points, which makes persistence a strong durability claim for a short span of rapidly updated closed models. Compounding this, the analysis partly relies on \\\"the use of large language models as research assistants for analyzing AI-generated images\\\" without any abstract-level evidence of reliability, leaving open a circularity risk where the analyst-model's priors may echo the generator's. None of these are fatal for a demo, but they mean the reported patterns are, on the critic's reading, asserted rather than established.","strongest_fair_defence":"The piece is explicitly a \\\"demo\\\" that \\\"introduces\\\" a method and illustrates strategies, not an empirical study claiming generalizable findings, so demands for sample sizes, identification, or inter-rater reliability are partly genre-mismatched. Judged as interpretive methods-building in the digital-methods tradition, its core contribution, a clear five-part prompting typology tied to distinct research aims, is well specified and reusable. It is candid where it matters: it frames the LLM-assisted step as \\\"iterative, supervised collaboration rather than full automation,\\\" and uses \\\"position\\\" and \\\"opening possibilities for\\\" to mark its more ambitious claims as agenda-setting rather than proven. The biodiversity case is offered as illustration, not proof, and surfacing house styles and idealized depictions is a plausible, useful prompt for further work.","final_judgment":"A clear and largely candid conceptual demo whose principal contribution, a reusable five-part prompting-strategy typology grounded in the query design framework, is genuine and appropriately framed. The critique is moderate and centers on scope discipline rather than method legitimacy: the empirical descriptors lean on strong verbs (\\\"reveal,\\\" persistence over time) that a single-concept case (biodiversity) and a short 2023-2024 window do not fully license, and the programmatic claim that the method \\\"reshapes generative AI systems\\\" appears, on the critic's reading, to outrun the described activity of prompting and interpreting outputs. The LLM-as-analyst step is honestly hedged but leaves an unaddressed shared-bias circularity risk. Severity is capped at moderate given abstract-only access and the work's self-described demo genre; read as an agenda-setting methods proposal rather than an evidential study, it is a reasonable contribution whose headline claims should be treated as illustrative and aspirational.","review_process":{"aiAgentsUsed":["claim_extraction","ai_agi_relevance","adversarial","author_defence","citation_integrity","legal_risk","meta_review"],"reviewRounds":1,"humanEditor":{"name":"","role":"","approvalDate":"","declaredConflict":"none"},"expertCertification":{"used":false}},"author_response":{"notified":false,"status":"not_yet_invited"},"versions":[{"version":"1.0","date":"2026-06-21","note":"","changeType":"initial"}],"transparency":{"modelCardUrl":"/critique/model-card","publicAuditSummary":"Critique generated by the AGI Social Scientist engine; ingested as a staged draft pending the automated integrity gate (no human editor).","privateAuditRecordExists":true,"citationVerification":{"status":"complete","checkedSources":[],"fabricatedCitations":0},"riskReview":{"copyright":"completed","defamation":"completed","note":"Abstract-only critique: no reproduction beyond sparse criticism/review quotation; critiques claims/methods/evidence not motives (motive-scan clean); no false statements of fact about persons."}}}