PRISON: Unmasking the Criminal Potential of Large Language Models
Wu, Xinyi, Hong, Geng, Chen, Pei, Chen, Yueyue, Pan, Xudong, Yang, Min
–arXiv.org Artificial Intelligence
Scenario to be rewritten: { scenario } To rigorously evaluate whether large language models (LLMs) could still recognize the original source behind these rewritten scenarios, we designed three complementary prompt strategies, each probing different aspects of the models' recognition and reasoning capabilities. Zero-shot Direct Identification focuses on testing the model's raw ability to recall source material under minimal guidance (Brown et al., 2020; Mu et al., 2024). Paraphrased Queries introduce linguistic variation to reduce prompt-specific biases and measure the robustness of recognition (Liu et al., 2024a; Ngweta et al., 2025). Instruction-tuned T ask-framed Prompts leverage explicit role framing and step-by-step task descriptions to maximize retrieval pressure and analytical reasoning (Ouyang et al., 2022; Sivarajkumar et al., 2024). By combining these strategies, we construct a comprehensive recognition test that balances sensitivity and robustness, ensuring that a scenario is only deemed valid if no prompt family leads to a confident and correct identification of the original work. This integrated approach provides a stronger safeguard against hidden memorization and enables more reliable downstream behavioral analysis of the tested LLMs. V alidation Prompt We designed three prompt families for scenario source identification. Each family targets a different aspect of model behavior: Given the following scenario: 19 { scenario } 1. Zero-shot Identification Please determine whether this scenario originates from a known literary or cinematic work.
arXiv.org Artificial Intelligence
Oct-20-2025
- Country:
- Asia
- North America > United States (0.14)
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Health & Medicine > Therapeutic Area (0.67)
- Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Leisure & Entertainment (1.00)
- Media > Film (0.92)
- Technology: