malicious document
Eyes-on-Me: Scalable RAG Poisoning through Transferable Attention-Steering Attractors
Chen, Yen-Shan, Huang, Sian-Yao, Yang, Cheng-Lin, Chen, Yun-Nung
Existing data poisoning attacks on retrieval-augmented generation (RAG) systems scale poorly because they require costly optimization of poisoned documents for each target phrase. We introduce Eyes-on-Me, a modular attack that decomposes an adversarial document into reusable Attention Attractors and Focus Regions. Attractors are optimized to direct attention to the Focus Region. Attackers can then insert semantic baits for the retriever or malicious instructions for the generator, adapting to new targets at near zero cost. This is achieved by steering a small subset of attention heads that we empirically identify as strongly correlated with attack success. Across 18 end-to-end RAG settings (3 datasets $\times$ 2 retrievers $\times$ 3 generators), Eyes-on-Me raises average attack success rates from 21.9 to 57.8 (+35.9 points, 2.6$\times$ over prior work). A single optimized attractor transfers to unseen black box retrievers and generators without retraining. Our findings establish a scalable paradigm for RAG data poisoning and show that modular, reusable components pose a practical threat to modern AI systems. They also reveal a strong link between attention concentration and model outputs, informing interpretability research.
- North America > United States (0.14)
- Asia > Taiwan (0.04)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- Asia > India (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search
Shen, Zeyu, Imana, Basileal, Wu, Tong, Xiang, Chong, Mittal, Prateek, Korolova, Aleksandra
Retrieval-Augmented Generation (RAG) enhances Large Language Models by grounding their outputs in external documents. These systems, however, remain vulnerable to attacks on the retrieval corpus, such as prompt injection. RAG-based search systems (e.g., Google's Search AI Overview) present an interesting setting for studying and protecting against such threats, as defense algorithms can benefit from built-in reliability signals -- like document ranking -- and represent a non-LLM challenge for the adversary due to decades of work to thwart SEO. Motivated by, but not limited to, this scenario, this work introduces ReliabilityRAG, a framework for adversarial robustness that explicitly leverages reliability information of retrieved documents. Our first contribution adopts a graph-theoretic perspective to identify a "consistent majority" among retrieved documents to filter out malicious ones. We introduce a novel algorithm based on finding a Maximum Independent Set (MIS) on a document graph where edges encode contradiction. Our MIS variant explicitly prioritizes higher-reliability documents and provides provable robustness guarantees against bounded adversarial corruption under natural assumptions. Recognizing the computational cost of exact MIS for large retrieval sets, our second contribution is a scalable weighted sample and aggregate framework. It explicitly utilizes reliability information, preserving some robustness guarantees while efficiently handling many documents. We present empirical results showing ReliabilityRAG provides superior robustness against adversarial attacks compared to prior methods, maintains high benign accuracy, and excels in long-form generation tasks where prior robustness-focused methods struggled. Our work is a significant step towards more effective, provably robust defenses against retrieved corpus corruption in RAG.
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > Canada (0.04)
- Europe > Monaco (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Information Management > Search (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
TrustRAG: Enhancing Robustness and Trustworthiness in RAG
Zhou, Huichi, Lee, Kin-Hei, Zhan, Zhonghao, Chen, Yue, Li, Zhenhao, Wang, Zhaoyang, Haddadi, Hamed, Yilmaz, Emine
Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge sources, enabling more accurate and contextually relevant responses tailored to user queries. However, these systems remain vulnerable to corpus poisoning attacks that can significantly degrade LLM performance through the injection of malicious content. To address these challenges, we propose TrustRAG, a robust framework that systematically filters compromised and irrelevant contents before they are retrieved for generation. Our approach implements a two-stage defense mechanism: At the first stage, it employs K-means clustering to identify potential attack patterns in retrieved documents using cosine similarity and ROUGE metrics as guidance, effectively isolating suspicious content. Secondly, it performs a self-assessment which detects malicious documents and resolves discrepancies between the model's internal knowledge and external information. TrustRAG functions as a plug-and-play, training-free module that integrates seamlessly with any language model, whether open or closed-source. In addition, TrustRAG maintains high contextual relevance while strengthening defenses against corpus poisoning attacks. Through extensive experimental validation, we demonstrate that TrustRAG delivers substantial improvements in retrieval accuracy, efficiency, and attack resistance compared to existing approaches across multiple model architectures and datasets. We have made TrustRAG available as open-source software at \url{https://github.com/HuichiZhou/TrustRAG}.
- Europe > United Kingdom > England > Greater London > London (0.04)
- North America > United States > North Carolina (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Asia > China (0.04)
ConfusedPilot: Confused Deputy Risks in RAG-based LLMs
RoyChowdhury, Ayush, Luo, Mulong, Sahu, Prateek, Banerjee, Sarbartha, Tiwari, Mohit
Retrieval augmented generation (RAG) is a process where a large language model (LLM) retrieves useful information from a database and then generates the responses. It is becoming popular in enterprise settings for daily business operations. For example, Copilot for Microsoft 365 has accumulated millions of businesses. However, the security implications of adopting such RAG-based systems are unclear. In this paper, we introduce ConfusedPilot, a class of security vulnerabilities of RAG systems that confuse Copilot and cause integrity and confidentiality violations in its responses. First, we investigate a vulnerability that embeds malicious text in the modified prompt in RAG, corrupting the responses generated by the LLM. Second, we demonstrate a vulnerability that leaks secret data, which leverages the caching mechanism during retrieval. Third, we investigate how both vulnerabilities can be exploited to propagate misinformation within the enterprise and ultimately impact its operations, such as sales and manufacturing. We also discuss the root cause of these attacks by investigating the architecture of a RAG-based system. This study highlights the security vulnerabilities in today's RAG-based systems and proposes design guidelines to secure future RAG-based systems.
- North America > Canada (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (2 more...)
Rag and Roll: An End-to-End Evaluation of Indirect Prompt Manipulations in LLM-based Application Frameworks
De Stefano, Gianluca, Schönherr, Lea, Pellegrino, Giancarlo
Retrieval Augmented Generation (RAG) is a technique commonly used to equip models with out of distribution knowledge. This process involves collecting, indexing, retrieving, and providing information to an LLM for generating responses. Despite its growing popularity due to its flexibility and low cost, the security implications of RAG have not been extensively studied. The data for such systems are often collected from public sources, providing an attacker a gateway for indirect prompt injections to manipulate the responses of the model. In this paper, we investigate the security of RAG systems against end-to-end indirect prompt manipulations. First, we review existing RAG framework pipelines, deriving a prototypical architecture and identifying critical parameters. We then examine prior works searching for techniques that attackers can use to perform indirect prompt manipulations. Finally, we implemented Rag 'n Roll, a framework to determine the effectiveness of attacks against end-to-end RAG applications. Our results show that existing attacks are mostly optimized to boost the ranking of malicious documents during the retrieval phase. However, a higher rank does not immediately translate into a reliable attack. Most attacks, against various configurations, settle around a 40% success rate, which could rise to 60% when considering ambiguous answers as successful attacks (those that include the expected benign one as well). Additionally, when using unoptimized documents, attackers deploying two of them (or more) for a target query can achieve similar results as those using optimized ones. Finally, exploration of the configuration space of a RAG showed limited impact in thwarting the attacks, where the most successful combination severely undermines functionality.
- Europe > Germany (0.14)
- North America > United States (0.14)
- Europe > Belgium (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (0.93)
Cross-Task Defense: Instruction-Tuning LLMs for Content Safety
Fu, Yu, Xiao, Wen, Chen, Jia, Li, Jiachen, Papalexakis, Evangelos, Chien, Aichi, Dong, Yue
Recent studies reveal that Large Language Models (LLMs) face challenges in balancing safety with utility, particularly when processing long texts for NLP tasks like summarization and translation. Despite defenses against malicious short questions, the ability of LLMs to safely handle dangerous long content, such as manuals teaching illicit activities, remains unclear. Our work aims to develop robust defenses for LLMs in processing malicious documents alongside benign NLP task queries. We introduce a defense dataset comprised of safety-related examples and propose single-task and mixed-task losses for instruction tuning. Our empirical results demonstrate that LLMs can significantly enhance their capacity to safely manage dangerous content with appropriate instruction tuning. Additionally, strengthening the defenses of tasks most susceptible to misuse is effective in protecting LLMs against processing harmful information. We also observe that trade-offs between utility and safety exist in defense strategies, where Llama2, utilizing our proposed approach, displays a significantly better balance compared to Llama1.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > California > Riverside County > Riverside (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Google Leverages Machine Learning to Improve Document Detection Capabilities
With the rise in technology and enhanced connectivity, we are unintentionally moving toward a more insecure world of malicious activities. Businesses today, while deploying technology, fear the loss they would face if security gets compromised. As most of them operate through e-mails, it turns into a major source for malware attacks. Moreover, lots of emails are sent with malicious intent, putting a heavy burden on Gmail to protect users. As it turns out, a lot of malicious attachments come from documents, but through innovation brought in by Google, Gmail is getting better at detecting them.
Gmail Is Catching More Malicious Attachments With Deep Learning
Distributing malware by attaching tainted documents to emails is one of the oldest tricks in the book. It's not just a theoretical risk--real attackers use malicious documents to infect targets all the time. So on top of its anti-spam and anti-phishing efforts, Gmail expanded its malware detection capabilities at the end of last year to include more tailored document monitoring. At the RSA security conference in San Francisco on Tuesday, Google's security and anti-abuse research lead Elie Bursztein will present findings on how the new deep-learning scanner for documents is faring against the 300 billion attachments it has to process each week. It's challenging to tell the difference between legitimate documents in all their infinite variations and those that have specifically been manipulated to conceal something dangerous.
Google Confirms New AI Tool Scans 300 Billion Gmail Attachments Every Week
Gmail has been changing the way we think about email since 2004. In that time, it has gained an eye-popping 1.5 billion users, according to Google. I'm one of them, and the chances are high that you are as well. A lot has changed in those 15 years. A lot has stayed the same. One of the static components in the world of email is malware, specifically malware in a document attached to your email.