SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models

Hong, Hanbin, Feng, Shuya, Naderloui, Nima, Yan, Shenao, Zhang, Jingyu, Liu, Biying, Arastehfard, Ali, Huang, Heqing, Hong, Yuan

arXiv.org Artificial Intelligence 

Large Language Models (LLMs) have rapidly transitioned from academic research to core components of real-world applications, especially since the emergence of high-profile foundation models such as OpenAI's GPT series [17, 140], Google Gemini [9], Meta Llama [175, 176], Anthropic Claude [12], Alibaba Qwen [11, 210, 209], and Doubao [172]. Today, LLMs are deployed across an unprecedented range of sectors--from web search and code assistants to legal, educational, and healthcare domains--reaching hundreds of millions of end users globally. The rapid adoption of LLMs has ushered in a new era of AI-powered services, but it also brings serious safety and security risks. These risks manifest in multiple forms, ranging from misinformation and privacy leaks to adversarial attacks that exploit model vulnerabilities. In particular, a growing body of work shows that carefully crafted jailbreak prompts can bypass alignment constraints, inducing models to produce sensitive, illegal, or harmful content. Alarmingly, recent studies report that such attacks achieve success rates exceeding 90% even on flagship models such as GPT-4, Claude 3, and DeepSeek-R1 [124, 42, 154, 118]. The outputs generated through these attacks could be used for malicious purposes, underscoring the urgent need for close attention and mitigation.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found