Evil twins are not that evil: Qualitative insights into machine-generated prompts

Rakotonirina, Nathanaël Carraz, Kervadec, Corentin, Franzon, Francesca, Baroni, Marco

Dec-11-2024–arXiv.org Artificial Intelligence

It has been widely observed that language models (LMs) respond in predictable ways to algorithmically generated prompts that are seemingly unintelligible. This is both a sign that we lack a full understanding of how LMs work, and a practical challenge, because opaqueness can be exploited for harmful uses of LMs, such as jailbreaking. We present the first thorough analysis of opaque machine-generated prompts, or autoprompts, pertaining to 3 LMs of different sizes and families. We find that machine-generated prompts are characterized by a last token that is often intelligible and strongly affects the generation. A small but consistent proportion of the previous tokens are fillers that probably appear in the prompt as a by-product of the fact that the optimization process fixes the number of tokens. The remaining tokens tend to have at least a loose semantic relation with the generation, although they do not engage in well-formed syntactic relations with it. We find moreover that some of the ablations we applied to machine-generated prompts can also be applied to natural language sequences, leading to similar behavior, suggesting that autoprompts are a direct consequence of the way in which LMs process linguistic inputs in general.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Dec-11-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - Jamaica (0.04)
  - United States
    - California (0.04)
    - Pennsylvania > Philadelphia County
      - Philadelphia (0.04)
    - New Mexico > Santa Fe County
      - Santa Fe (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - Florida > Miami-Dade County
      - Miami (0.04)
- Europe
  - Austria > Vienna (0.14)
  - Kosovo (0.04)
  - France (0.04)
  - Germany > Baden-Württemberg
    - Stuttgart Region > Stuttgart (0.04)
  - Croatia > Dubrovnik-Neretva County
    - Dubrovnik (0.04)
- Asia
  - Singapore (0.04)
  - Middle East > UAE (0.04)
  - China > Hong Kong (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
- Africa > Rwanda
  - Kigali > Kigali (0.04)

Genre:
- Research Report (1.00)

Industry:
- Government (0.68)
- Leisure & Entertainment (0.68)
- Transportation > Air (0.46)
- Media > Film (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found