Security Concerns for Large Language Models: A Survey

Li, Miles Q., Fung, Benjamin C. M.

arXiv.org Artificial Intelligence 

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing (NLP), including text generation, translation, summarization, and code synthesis, as a consequence of which revolutionizing a wide range of AI applications [10, 56, 45]. Models such as OpenAI's ChatGPT series, Google's Gemini, and Anthropic's Claude have been widely deployed in commercial systems, including search engines, customer support, software development tools, and personal assistants [45, 55, 3]. However, as their capabilities grow, so do their attack surfaces and the potential for misuse [51, 77, 50]. While the scale and specific nature of these vulnerabilities are new, the fundamental challenge of ensuring that powerful AI systems operate safely and align with human intent is a longstanding concern in the AI community. Foundational work, such as the identification of concrete problems in AI safety long before the current LLM era, laid the groundwork for understanding issues like reward hacking and negative side effects that remain highly relevant today [1]. The susceptibility arises because the models are trained on vast, yet imperfectly curated, datasets containing potentially harmful content, and because they interact with users through open-ended prompts that can be manipulated [48, 17, 16]. Researchers and practitioners are increasingly concerned that these systems can be manipulated, misused, or even behave in misaligned and potentially deceptive ways [25, 42, 6]. Consequently, the security and alignment of LLMs have become critical areas of study, requiring an understanding of emergent threats and robust, multi-faceted defenses [17, 70, 43].