PRISON: Unmasking the Criminal Potential of Large Language Models

Wu, Xinyi, Hong, Geng, Chen, Pei, Chen, Yueyue, Pan, Xudong, Yang, Min

Oct-20-2025–arXiv.org Artificial Intelligence

Scenario to be rewritten: { scenario } To rigorously evaluate whether large language models (LLMs) could still recognize the original source behind these rewritten scenarios, we designed three complementary prompt strategies, each probing different aspects of the models' recognition and reasoning capabilities. Zero-shot Direct Identification focuses on testing the model's raw ability to recall source material under minimal guidance (Brown et al., 2020; Mu et al., 2024). Paraphrased Queries introduce linguistic variation to reduce prompt-specific biases and measure the robustness of recognition (Liu et al., 2024a; Ngweta et al., 2025). Instruction-tuned T ask-framed Prompts leverage explicit role framing and step-by-step task descriptions to maximize retrieval pressure and analytical reasoning (Ouyang et al., 2022; Sivarajkumar et al., 2024). By combining these strategies, we construct a comprehensive recognition test that balances sensitivity and robustness, ensuring that a scenario is only deemed valid if no prompt family leads to a confident and correct identification of the original work. This integrated approach provides a stronger safeguard against hidden memorization and enables more reliable downstream behavioral analysis of the tested LLMs. V alidation Prompt We designed three prompt families for scenario source identification. Each family targets a different aspect of model behavior: Given the following scenario: 19 { scenario } 1. Zero-shot Identification Please determine whether this scenario originates from a known literary or cinematic work.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Oct-20-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (1.00)

Industry:
- Leisure & Entertainment (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law (1.00)
- Media > Film (0.92)
- Health & Medicine > Therapeutic Area (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found