System Prompt Poisoning: Persistent Attacks on Large Language Models Beyond User Injection

Li, Zongze, Guo, Jiawei, Cai, Haipeng

arXiv.org Artificial Intelligence 

Large language models (LLMs) have gained widespread adoption across diverse domains and applications. However, as LLMs become more integrated into various systems, concerns around their security are growing. Existing relevant studies mainly focus on threats arising from user prompts (e.g., prompt injection attack) and model output (e.g. We introduce system prompt poisoning, a new attack vector against LLMs that, unlike traditional user prompt injection, poisons system prompts and persistently impacts all subsequent user interactions and model responses. We propose three practical attack strategies: brute-force poisoning, adaptive in-context poisoning, and adaptive chain-of-thought (CoT) poisoning, and introduce Auto-SPP, a framework that automates the poisoning of system prompts with these strategies. Our comprehensive evaluation across four reasoning and non-reasoning LLMs, four distinct attack scenarios, and two challenging domains (mathematics and coding) reveals the attack's severe impact. The findings demonstrate that system prompt poisoning is not only highly effective, drastically degrading task performance in all scenario-strategy combinations, but also persistent and robust, remaining potent even when user prompts employ prompting-augmented techniques like CoT. Critically, our results highlight the stealthiness of this attack by showing that current black-box based prompt injection defenses cannot effectively defend against it. Large language models (LLMs) like GPT -5 (OpenAI, 2025), Gemini 2.5 (Gemini Team and Google, 2023), and Claude Opus 4.1 (Anthropic, 2025) have shown exceptional performance, driving their widespread integration into the modern software ecosystem. This includes domain-specific applications like Cursor (Anysphere, Inc., 2025) and Adobe Firefly (Adobe, 2025), development frameworks such as Langchain (Harrison Chase, 2025) and Promptflow (Microsoft, 2025), and research communities like Hugging Face (Face, 2025) and HELM (Liang et al., 2022). The proliferation of LLMs has heightened security concerns, with popular commercial platforms (e.g., ChatGPT, Gemini) exhibiting vulnerabilities such as data poisoning and jailbreaks (Zou et al., 2023a; Fu et al., 2024; Bowen et al., 2024). This risk extends across the entire LLM ecosystem, where studies show data abuse and privacy violations are are frequently reported (Hou et al., 2024; Iqbal et al., 2024; Huang et al., 2024). Prompts in LLMs are typically categorized into two types: user prompt and system prompt. User prompt refers to the input provided by the end-user that is meant to get a specific response from language model.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found