llm-integrated application
DataSentinel: A Game-Theoretic Detection of Prompt Injection Attacks
Liu, Yupei, Jia, Yuqi, Jia, Jinyuan, Song, Dawn, Gong, Neil Zhenqiang
LLM-integrated applications and agents-such as Bing Copilot [1], Google search with AI overviews [2], and Amazon's review highlights [3]-are emerging applications built upon large language models (LLMs). The growing popularity of LLM-integrated applications has led to the emergence of app stores, such as OpenAI's GPT Store and Poe [4], where developers can publish their LLMintegrated applications and users can access them, much like the Google Play and App Store for mobile apps. In general, an LLM-integrated application intends to perform a task (referred to as target task), such as webpage summarization in AI-assisted search. Towards this goal, an LLM-integrated application takes a prompt, which is the concatenation of an instruction (referred to as target instruction) and data (referred to as target data), as an input to query the backend LLM, whose response would solve the target task. The target instruction is often designed by an application developer to direct the backend LLM to perform the target task, while the data is the information to be processed by the backend LLM and is usually from an external source, e.g., the Internet. For instance, when the target task is webpage summarization in AI-assisted search, the target instruction can be "Please summarize the following web pages: [Text from relevant web pages].",
System Prompt Poisoning: Persistent Attacks on Large Language Models Beyond User Injection
Li, Zongze, Guo, Jiawei, Cai, Haipeng
Large language models (LLMs) have gained widespread adoption across diverse domains and applications. However, as LLMs become more integrated into various systems, concerns around their security are growing. Existing relevant studies mainly focus on threats arising from user prompts (e.g., prompt injection attack) and model output (e.g. We introduce system prompt poisoning, a new attack vector against LLMs that, unlike traditional user prompt injection, poisons system prompts and persistently impacts all subsequent user interactions and model responses. We propose three practical attack strategies: brute-force poisoning, adaptive in-context poisoning, and adaptive chain-of-thought (CoT) poisoning, and introduce Auto-SPP, a framework that automates the poisoning of system prompts with these strategies. Our comprehensive evaluation across four reasoning and non-reasoning LLMs, four distinct attack scenarios, and two challenging domains (mathematics and coding) reveals the attack's severe impact. The findings demonstrate that system prompt poisoning is not only highly effective, drastically degrading task performance in all scenario-strategy combinations, but also persistent and robust, remaining potent even when user prompts employ prompting-augmented techniques like CoT. Critically, our results highlight the stealthiness of this attack by showing that current black-box based prompt injection defenses cannot effectively defend against it. Large language models (LLMs) like GPT -5 (OpenAI, 2025), Gemini 2.5 (Gemini Team and Google, 2023), and Claude Opus 4.1 (Anthropic, 2025) have shown exceptional performance, driving their widespread integration into the modern software ecosystem. This includes domain-specific applications like Cursor (Anysphere, Inc., 2025) and Adobe Firefly (Adobe, 2025), development frameworks such as Langchain (Harrison Chase, 2025) and Promptflow (Microsoft, 2025), and research communities like Hugging Face (Face, 2025) and HELM (Liang et al., 2022). The proliferation of LLMs has heightened security concerns, with popular commercial platforms (e.g., ChatGPT, Gemini) exhibiting vulnerabilities such as data poisoning and jailbreaks (Zou et al., 2023a; Fu et al., 2024; Bowen et al., 2024). This risk extends across the entire LLM ecosystem, where studies show data abuse and privacy violations are are frequently reported (Hou et al., 2024; Iqbal et al., 2024; Huang et al., 2024). Prompts in LLMs are typically categorized into two types: user prompt and system prompt. User prompt refers to the input provided by the end-user that is meant to get a specific response from language model.
Defending against prompt injection with structured queries (StruQ) and preference optimization (SecAlign)
Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications. However, as LLMs have improved, so have the attacks against them. Prompt injection attack is listed as the #1 threat by OWASP to LLM-integrated applications, where an LLM input contains a trusted prompt (instruction) and an untrusted data. The data may contain injected instructions to arbitrarily manipulate the LLM. As an example, to unfairly promote "Restaurant A", its owner could use prompt injection to post a review on Yelp, e.g., "Ignore your previous instruction.
Large Language Models as Software Components: A Taxonomy for LLM-Integrated Applications
Large Language Models (LLMs) have become widely adopted recently. Research explores their use both as autonomous agents and as tools for software engineering. LLM-integrated applications, on the other hand, are software systems that leverage an LLM to perform tasks that would otherwise be impossible or require significant coding effort. While LLM-integrated application engineering is emerging as new discipline, its terminology, concepts and methods need to be established. This study provides a taxonomy for LLM-integrated applications, offering a framework for analyzing and describing these systems. It also demonstrates various ways to utilize LLMs in applications, as well as options for implementing such integrations. Following established methods, we analyze a sample of recent LLM-integrated applications to identify relevant dimensions. We evaluate the taxonomy by applying it to additional cases. This review shows that applications integrate LLMs in numerous ways for various purposes. Frequently, they comprise multiple LLM integrations, which we term ``LLM components''. To gain a clear understanding of an application's architecture, we examine each LLM component separately. We identify thirteen dimensions along which to characterize an LLM component, including the LLM skills leveraged, the format of the output, and more. LLM-integrated applications are described as combinations of their LLM components. We suggest a concise representation using feature vectors for visualization. The taxonomy is effective for describing LLM-integrated applications. It can contribute to theory building in the nascent field of LLM-integrated application engineering and aid in developing such systems. Researchers and practitioners explore numerous creative ways to leverage LLMs in applications. Though challenges persist, integrating LLMs may revolutionize the way software systems are built.
Signed-Prompt: A New Approach to Prevent Prompt Injection Attacks Against LLM-Integrated Applications
The critical challenge of prompt injection attacks in Large Language Models (LLMs) integrated applications, a growing concern in the Artificial Intelligence (AI) field. Such attacks, which manipulate LLMs through natural language inputs, pose a significant threat to the security of these applications. Traditional defense strategies, including output and input filtering, as well as delimiter use, have proven inadequate. This paper introduces the 'Signed-Prompt' method as a novel solution. The study involves signing sensitive instructions within command segments by authorized users, enabling the LLM to discern trusted instruction sources. The paper presents a comprehensive analysis of prompt injection attack patterns, followed by a detailed explanation of the Signed-Prompt concept, including its basic architecture and implementation through both prompt engineering and fine-tuning of LLMs. Experiments demonstrate the effectiveness of the Signed-Prompt method, showing substantial resistance to various types of prompt injection attacks, thus validating its potential as a robust defense strategy in AI security.
Jatmo: Prompt Injection Defense by Task-Specific Finetuning
Piet, Julien, Alrashed, Maha, Sitawarin, Chawin, Chen, Sizhe, Wei, Zeming, Sun, Elizabeth, Alomair, Basel, Wagner, David
Large Language Models (LLMs) are attracting significant research attention due to their instruction-following abilities, allowing users and developers to leverage LLMs for a variety of tasks. However, LLMs are vulnerable to prompt-injection attacks: a class of attacks that hijack the model's instruction-following abilities, changing responses to prompts to undesired, possibly malicious ones. In this work, we introduce Jatmo, a method for generating task-specific models resilient to prompt-injection attacks. Jatmo leverages the fact that LLMs can only follow instructions once they have undergone instruction tuning. It harnesses a teacher instruction-tuned model to generate a task-specific dataset, which is then used to fine-tune a base model (i.e., a non-instruction-tuned model). Jatmo only needs a task prompt and a dataset of inputs for the task: it uses the teacher model to generate outputs. For situations with no pre-existing datasets, Jatmo can use a single example, or in some cases none at all, to produce a fully synthetic dataset. Our experiments on seven tasks show that Jatmo models provide similar quality of outputs on their specific task as standard LLMs, while being resilient to prompt injections. The best attacks succeeded in less than 0.5% of cases against our models, versus 87% success rate against GPT-3.5-Turbo. We release Jatmo at https://github.com/wagner-group/prompt-injection-defense.
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models
Yi, Jingwei, Xie, Yueqi, Zhu, Bin, Hines, Keegan, Kiciman, Emre, Sun, Guangzhong, Xie, Xing, Wu, Fangzhao
Recent remarkable advancements in large language models (LLMs) have led to their widespread adoption in various applications. A key feature of these applications is the combination of LLMs with external content, where user instructions and third-party content are combined to create prompts for LLM processing. These applications, however, are vulnerable to indirect prompt injection attacks, where malicious instructions embedded within external content compromise LLM's output, causing their responses to deviate from user expectations. Despite the discovery of this security issue, no comprehensive analysis of indirect prompt injection attacks on different LLMs is available due to the lack of a benchmark. Furthermore, no effective defense has been proposed. In this work, we introduce the first benchmark, BIPIA, to measure the robustness of various LLMs and defenses against indirect prompt injection attacks. Our experiments reveal that LLMs with greater capabilities exhibit more vulnerable to indirect prompt injection attacks for text tasks, resulting in a higher ASR. We hypothesize that indirect prompt injection attacks are mainly due to the LLMs' inability to distinguish between instructions and external content. Based on this conjecture, we propose four black-box methods based on prompt learning and a white-box defense methods based on fine-tuning with adversarial training to enable LLMs to distinguish between instructions and external content and ignore instructions in the external content. Our experimental results show that our black-box defense methods can effectively reduce ASR but cannot completely thwart indirect prompt injection attacks, while our white-box defense method can reduce ASR to nearly zero with little adverse impact on the LLM's performance on general tasks. We hope that our benchmark and defenses can inspire future work in this important area.
Identifying and Mitigating Vulnerabilities in LLM-Integrated Applications
Jiang, Fengqing, Xu, Zhangchen, Niu, Luyao, Wang, Boxin, Jia, Jinyuan, Li, Bo, Poovendran, Radha
Large language models (LLMs) are increasingly deployed as the service backend for LLM-integrated applications such as code completion and AI-powered search. LLM-integrated applications serve as middleware to refine users' queries with domain-specific knowledge to better inform LLMs and enhance the responses. Despite numerous opportunities and benefits, LLM-integrated applications also introduce new attack surfaces. Understanding, minimizing, and eliminating these emerging attack surfaces is a new area of research. In this work, we consider a setup where the user and LLM interact via an LLM-integrated application in the middle. We focus on the communication rounds that begin with user's queries and end with LLM-integrated application returning responses to the queries, powered by LLMs at the service backend. For this query-response protocol, we identify potential vulnerabilities that can originate from the malicious application developer or from an outsider threat initiator that is able to control the database access, manipulate and poison data that are high-risk for the user. Successful exploits of the identified vulnerabilities result in the users receiving responses tailored to the intent of a threat initiator. We assess such threats against LLM-integrated applications empowered by OpenAI GPT-3.5 and GPT-4. Our empirical results show that the threats can effectively bypass the restrictions and moderation policies of OpenAI, resulting in users receiving responses that contain bias, toxic content, privacy risk, and disinformation. To mitigate those threats, we identify and define four key properties, namely integrity, source identification, attack detectability, and utility preservation, that need to be satisfied by a safe LLM-integrated application. Based on these properties, we develop a lightweight, threat-agnostic defense that mitigates both insider and outsider threats.
Prompt Injection Attacks and Defenses in LLM-Integrated Applications
Liu, Yupei, Jia, Yuqi, Geng, Runpeng, Jia, Jinyuan, Gong, Neil Zhenqiang
Large Language Models (LLMs) are increasingly deployed as the backend for a variety of real-world applications called LLM-Integrated Applications. Multiple recent works showed that LLM-Integrated Applications are vulnerable to prompt injection attacks, in which an attacker injects malicious instruction/data into the input of those applications such that they produce results as the attacker desires. However, existing works are limited to case studies. As a result, the literature lacks a systematic understanding of prompt injection attacks and their defenses. We aim to bridge the gap in this work. In particular, we propose a general framework to formalize prompt injection attacks. Existing attacks, which are discussed in research papers and blog posts, are special cases in our framework. Our framework enables us to design a new attack by combining existing attacks. Moreover, we also propose a framework to systematize defenses against prompt injection attacks. Using our frameworks, we conduct a systematic evaluation on prompt injection attacks and their defenses with 10 LLMs and 7 tasks. We hope our frameworks can inspire future research in this field. Our code is available at https://github.com/liu00222/Open-Prompt-Injection.
Prompt Injection attack against LLM-integrated Applications
Liu, Yi, Deng, Gelei, Li, Yuekang, Wang, Kailong, Zhang, Tianwei, Liu, Yepang, Wang, Haoyu, Zheng, Yan, Liu, Yang
Large Language Models (LLMs), renowned for their superior proficiency in language comprehension and generation, stimulate a vibrant ecosystem of applications around them. However, their extensive assimilation into various services introduces significant security risks. This study deconstructs the complexities and implications of prompt injection attacks on actual LLM-integrated applications. Initially, we conduct an exploratory analysis on ten commercial applications, highlighting the constraints of current attack strategies in practice. Prompted by these limitations, we subsequently formulate HouYi, a novel black-box prompt injection attack technique, which draws inspiration from traditional web injection attacks. HouYi is compartmentalized into three crucial elements: a seamlessly-incorporated pre-constructed prompt, an injection prompt inducing context partition, and a malicious payload designed to fulfill the attack objectives. Leveraging HouYi, we unveil previously unknown and severe attack outcomes, such as unrestricted arbitrary LLM usage and uncomplicated application prompt theft. We deploy HouYi on 36 actual LLM-integrated applications and discern 31 applications susceptible to prompt injection. 10 vendors have validated our discoveries, including Notion, which has the potential to impact millions of users. Our investigation illuminates both the possible risks of prompt injection attacks and the possible tactics for mitigation.