Controlled Generation of Natural Adversarial Documents for Stealthy Retrieval Poisoning
Zhang, Collin, Zhang, Tingwei, Shmatikov, Vitaly
–arXiv.org Artificial Intelligence
Recent work showed that retrieval based on embedding similarity (e.g., for retrieval-augmented generation) is vulnerable to poisoning: an adversary can craft malicious documents that are retrieved in response to broad classes of queries. We demonstrate that previous, HotFlip-based techniques produce documents that are very easy to detect using perplexity filtering. Even if generation is constrained to produce low-perplexity text, the resulting documents are recognized as unnatural by LLMs and can be automatically filtered from the retrieval corpus. We design, implement, and evaluate a new controlled generation technique that combines an adversarial objective (embedding similarity) with a "naturalness" objective based on soft scores computed using an open-source, surrogate LLM. The resulting adversarial documents (1) cannot be automatically detected using perplexity filtering and/or other LLMs, except at the cost of significant false positives in the retrieval corpus, yet (2) achieve similar poisoning efficacy to easilydetectable documents generated using HotFlip, and (3) are significantly more effective than prior methods for energy-guided generation, such as COLD. Many modern retrieval systems use embeddings, i.e., dense vector representations, of documents and queries to enable retrieval based on semantic similarity. Chaudhari et al. (2024) and Zhong et al. (2023) recently demonstrated that an adversary can use HotFlip Ebrahimi et al. (2018) to generate documents whose embeddings have high similarity to, and will thus be retrieved in response to, broad classes of queries. We first demonstrate that adversarial documents produced by HotFlip have much higher perplexity than normal text and can be filtered out with negligible collateral damage (i.e., false positives).
arXiv.org Artificial Intelligence
Oct-2-2024
- Genre:
- Research Report (0.64)
- Industry:
- Leisure & Entertainment
- Games > Computer Games (0.74)
- Sports > Olympic Games (1.00)
- Leisure & Entertainment
- Technology: