Prompt Injection Attacks on LLM Generated Reviews of Scientific Publications
–arXiv.org Artificial Intelligence
The ongoing intense discussion on rising LLM usage in the scientificpeer-review process has recently been mingled by reports of authors using hi dden prompt injections to manipulate review scores. Since the existence of su ch "attacks" - although seen by some commentators as "self-defense" - would have a great impact on the further debate, this paper investigates the practicability and technical success of the described manipulations. Our systematic evaluation uses 1k reviews of 2024 ICLR papers generated by a wide range of LLMs shows two distinct results: I) very simple prompt injections are indeed highly effective, reaching up to 100% acceptance scores. II) LLM reviews are generally biased toward acceptance (>95% in many models). Both results have great impact on the ongoing discussionson LLM usage in peer-review.
arXiv.org Artificial Intelligence
Sep-26-2025
- Country:
- Europe
- Germany (0.04)
- Switzerland > Basel-City
- Basel (0.04)
- North America > United States
- Hawaii > Honolulu County > Honolulu (0.04)
- Europe
- Genre:
- Research Report > New Finding (0.46)
- Technology: