Intrinsic Model Weaknesses: How Priming Attacks Unveil Vulnerabilities in Large Language Models

Huang, Yuyi, Zhan, Runzhe, Wong, Derek F., Chao, Lidia S., Tao, Ailin

Feb-23-2025–arXiv.org Artificial Intelligence

Large language models (LLMs) have significantly influenced various industries but suffer from a critical flaw, the potential sensitivity of generating harmful content, which poses severe societal risks. We developed and tested novel attack strategies on popular LLMs to expose their vulnerabilities in generating inappropriate content. These strategies, inspired by psychological phenomena such as the "Priming Effect", "Safe Attention Shift", and "Cognitive Dissonance", effectively attack the models' guarding mechanisms. Our experiments achieved an attack success rate (ASR) of 100% on various open-source models, including Meta's Llama-3.2, Google's Gemma-2, Mistral's Mistral-NeMo, Falcon's Falcon-mamba, Apple's DCLM, Microsoft's Phi3, and Qwen's Qwen2.5, among others. Similarly, for closed-source models such as OpenAI's GPT-4o, Google's Gemini-1.5, and Claude-3.5, we observed an ASR of at least 95% on the AdvBench dataset, which represents the current state-of-the-art. This study underscores the urgent need to reassess the use of generative models in critical applications to mitigate potential adverse societal impacts.

language model, victim, vulnerability, (14 more...)

arXiv.org Artificial Intelligence

Feb-23-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > Florida
    - Miami-Dade County > Miami (0.04)
  - Mexico > Mexico City
    - Mexico City (0.04)
- Europe > Austria
  - Vienna (0.14)
- Asia
  - Macao (0.14)
  - Thailand > Bangkok
    - Bangkok (0.04)
  - Middle East
    - Jordan (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.14)
  - China > Guangdong Province
    - Guangzhou (0.04)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (0.67)

Industry:
- Law Enforcement & Public Safety (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government > Military (0.88)
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Consumer Health (0.93)
  - Therapeutic Area
    - Psychiatry/Psychology (1.00)
    - Neurology (1.00)
    - Immunology (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found