Semantic-guided Prompt Organization for Universal Goal Hijacking against LLMs

Huang, Yihao, Wang, Chong, Jia, Xiaojun, Guo, Qing, Juefei-Xu, Felix, Zhang, Jian, Pu, Geguang, Liu, Yang

May-23-2024–arXiv.org Artificial Intelligence

Abstract--With the rising popularity of Large Language Models (LLMs), assessing their trustworthiness through security tasks has gained critical importance. Regarding the new task of universal goal hijacking, previous efforts have concentrated solely on optimization algorithms, overlooking the crucial role of the prompt. To fill this gap, we propose a universal goal hijacking method called POUGH that incorporates semantic-guided prompt processing strategies. Specifically, the method starts with a sampling strategy to select representative prompts from a candidate pool, followed by a ranking strategy that prioritizes the prompts. Once the prompts are organized sequentially, the method employs an iterative optimization algorithm to generate the universal fixed suffix for the prompts. Experiments conducted on four popular LLMs and ten types of target responses verified the effectiveness of our method.

dataset, target response, universal goal, (15 more...)

arXiv.org Artificial Intelligence

May-23-2024

arXiv.org PDF

Add feedback

Country:
- South America > Colombia
  - Meta Department > Villavicencio (0.04)
- North America > United States
  - New York (0.04)
- Europe
  - France (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
- Asia
  - Singapore (0.04)
  - Vietnam (0.04)
  - India (0.04)
  - China > Hong Kong (0.04)
- Africa > Middle East
  - Egypt (0.04)

Genre:
- Research Report (0.82)

Industry:
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Consumer Health (1.00)
- Government (1.00)
- Law Enforcement & Public Safety > Terrorism (0.85)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found