AITopics | adversarial text

Collaborating Authors

adversarial text

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

c50a537060022ba5fc3d6a856625b664-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 00:14:28 GMT

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > Dominican Republic (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(6 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Government (1.00)
Information Technology > Security & Privacy (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

c50a537060022ba5fc3d6a856625b664-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 06:54:58 GMT

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > Dominican Republic (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(6 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Government (1.00)
Information Technology > Security & Privacy (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents

Jin, Hyundong, Sung, Sicheol, Park, Shinwoo, Baik, SeungYeop, Han, Yo-Sub

arXiv.org Artificial IntelligenceSep-30-2025

The reasoning, writing, text-editing, and retrieval capabilities of proprietary large language models (LLMs) have advanced rapidly, providing users with an ever-expanding set of functionalities. However, this growing utility has also led to a serious societal concern: the over-reliance on LLMs. In particular, users increasingly delegate tasks such as homework, assignments, or the processing of sensitive documents to LLMs without meaningful engagement. This form of over-reliance and misuse is emerging as a significant social issue. In order to mitigate these issues, we propose a method injecting imperceptible phantom tokens into documents, which causes LLMs to generate outputs that appear plausible to users but are in fact incorrect. Based on this technique, we introduce TRAPDOC, a framework designed to deceive over-reliant LLM users. Through empirical evaluation, we demonstrate the effectiveness of our framework on proprietary LLMs, comparing its impact against several baselines. TRAPDOC serves as a strong foundation for promoting more responsible and thoughtful engagement with language models. Our code is available at https://github.com/jindong22/TrapDoc.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2506.00089

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Education > Curriculum (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

UniC-RAG: Universal Knowledge Corruption Attacks to Retrieval-Augmented Generation

Geng, Runpeng, Wang, Yanting, Chen, Ying, Jia, Jinyuan

arXiv.org Artificial IntelligenceAug-27-2025

Retrieval-augmented generation (RAG) systems are widely deployed in real-world applications in diverse domains such as finance, healthcare, and cybersecurity. However, many studies showed that they are vulnerable to knowledge corruption attacks, where an attacker can inject adversarial texts into the knowledge database of a RAG system to induce the LLM to generate attacker-desired outputs. Existing studies mainly focus on attacking specific queries or queries with similar topics (or keywords). In this work, we propose UniC-RAG, a universal knowledge corruption attack against RAG systems. Unlike prior work, UniC-RAG jointly optimizes a small number of adversarial texts that can simultaneously attack a large number of user queries with diverse topics and domains, enabling an attacker to achieve various malicious objectives, such as directing users to malicious websites, triggering harmful command execution, or launching denial-of-service attacks. We formulate UniC-RAG as an optimization problem and further design an effective solution to solve it, including a balanced similarity-based clustering method to enhance the attack's effectiveness. Our extensive evaluations demonstrate that UniC-RAG is highly effective and significantly outperforms baselines. For instance, UniC-RAG could achieve over 90% attack success rate by injecting 100 adversarial texts into a knowledge database with millions of texts to simultaneously attack a large set of user queries (e.g., 2,000). Additionally, we evaluate existing defenses and show that they are insufficient to defend against UniC-RAG, highlighting the need for new defense mechanisms in RAG systems.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2508.18652

Genre: Research Report > New Finding (0.89)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation

Shu, Huizhen, Li, Xuying, Wang, Qirui, Kosuga, Yuji, Tian, Mengqiu, Li, Zhuo

arXiv.org Artificial IntelligenceAug-15-2025

With the rapid proliferation of Natural Language Processing (NLP), especially Large Language Models (LLMs), generating adversarial examples to jailbreak LLMs remains a key challenge for understanding model vulnerabilities and improving robustness. In this context, we propose a new black-box attack method that leverages the interpretability of large models. We introduce the Sparse Feature Perturbation Framework (SFPF), a novel approach for adversarial text generation that utilizes sparse autoencoders to identify and manipulate critical features in text. After using the SAE model to reconstruct hidden layer representations, we perform feature clustering on the successfully attacked texts to identify features with higher activations. These highly activated features are then perturbed to generate new adversarial texts. This selective perturbation preserves the malicious intent while amplifying safety signals, thereby increasing their potential to evade existing defenses. Our method enables a new red-teaming strategy that balances adversarial effectiveness with safety alignment. Experimental results demonstrate that adversarial texts generated by SFPF can bypass state-of-the-art defense mechanisms, revealing persistent vulnerabilities in current NLP systems.However, the method's effectiveness varies across prompts and layers, and its generalizability to other architectures and larger models remains to be validated.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2508.10404

Genre: Research Report > New Finding (0.87)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Defending Against Knowledge Poisoning Attacks During Retrieval-Augmented Generation

Edemacu, Kennedy, Shashidhar, Vinay M., Tuape, Micheal, Abudu, Dan, Jang, Beakcheol, Kim, Jong Wook

arXiv.org Artificial IntelligenceAug-6-2025

--Retrieval-Augmented Generation (RAG) has emerged as a powerful approach to boost the capabilities of large language models (LLMs) by incorporating external, up-to-date knowledge sources. However, this introduces a potential vulnerability to knowledge poisoning attacks, where attackers can compromise the knowledge source to mislead the generation model. One such attack is the PoisonedRAG in which the injected adversarial texts steer the model to generate an attacker chosen response for a target question. In this work, we propose novel defense methods, FilterRAG and ML-FilterRAG, to mitigate the PoisonedRAG attack. First, we propose a new property to uncover distinct properties to differentiate between adversarial and clean texts in the knowledge data source. Next, we employ this property to filter out adversarial texts from clean ones in the design of our proposed approaches. Evaluation of these methods using benchmark datasets demonstrate their effectiveness, with performances close to those of the original RAG systems. KEY challenge associated with large language models (LLMs) [1]-[3] is their tendency of becoming outdated and struggling to integrate the most recent knowledge [4], [5]. This fundamental short-coming is addressed by the recent emergency of retrieval-augmented generation (RAG) [6]-[9].

adversarial text, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2508.02835

Country:

North America > United States (0.46)
Asia > South Korea (0.28)
Europe > Finland (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Immunology (0.95)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring Gradient-Guided Masked Language Model to Detect Textual Adversarial Attacks

Zhang, Xiaomei, Zhang, Zhaoxi, Zhang, Yanjun, Zheng, Xufei, Zhang, Leo Yu, Hu, Shengshan, Pan, Shirui

arXiv.org Artificial IntelligenceApr-15-2025

--T extual adversarial examples pose serious threats to the reliability of natural language processing systems. Recent studies suggest that adversarial examples tend to deviate from the underlying manifold of normal texts, whereas pre-trained masked language models can approximate the manifold of normal data. These findings inspire the exploration of masked language models for detecting textual adversarial attacks. We first introduce Masked Language Model-based Detection (MLMD), leveraging the mask and unmask operations of the masked language modeling (MLM) objective to induce the difference in manifold changes between normal and adversarial texts. Although MLMD achieves competitive detection performance, its exhaustive one-by-one masking strategy introduces significant computational overhead. Our posterior analysis reveals that a significant number of non-keywords in the input are not important for detection but consume resources. Building on this, we introduce Gradient-guided MLMD (GradMLMD), which leverages gradient information to identify and skip non-keywords during detection, significantly reducing resource consumption without compromising detection performance. Extensive experiments show that GradMLMD maintains comparable or better performance than MLMD and outperforms existing detectors. Among defenses based on the off-manifold conjecture, GradMLMD presents a novel method for capturing manifold changes and provides a practical solution for real-world application challenges. Index T erms --NLP, adversarial attack, adversarial defense, masked language model. L THOUGH advanced deep neural networks have the potential to revolutionize the performance of myriad natural language processing (NLP) tasks [1-3], they are highly vulnerable to adversarial attacks [4-7]. Through carefully manipulated inputs, attackers can drive models to produce erroneous outputs to their advantage. Many researchers have focused on introducing adversarial perturbations into the input by altering entire sentences. However, predominant efforts have been made to develop attacks at the word-level and character-level [8-14]. Correspondence to Dr. L. Zhang and Prof. X. Zheng Xiaomei Zhang, Leo Y u Zhang and Shirui Pan are with the School of Information and Communication Technology, Griffith University, Queensland, Australia (e-mail: xiaomei.zhang@griffithuni.edu.au, Zhaoxi Zhang and Y anjun Zhang are with the School of Computer Science, University of Technology Sydney, Sydney, New South Wales, Australia (email: Zhaoxi.Zhang-1@student.uts.edu.au, Xufei Zheng is with the College of Computer and Information Science, Southwest University, Chongqing, China (e-mail: zxufei@swu.edu.cn).

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2504.08798

Country:

Oceania > Australia > New South Wales > Sydney (0.24)
Asia > China > Chongqing Province > Chongqing (0.24)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TSCheater: Generating High-Quality Tibetan Adversarial Texts via Visual Similarity

Cao, Xi, Gesang, Quzong, Sun, Yuan, Qun, Nuo, Nyima, Tashi

arXiv.org Artificial IntelligenceDec-26-2024

Language models based on deep neural networks are vulnerable to textual adversarial attacks. While rich-resource languages like English are receiving focused attention, Tibetan, a cross-border language, is gradually being studied due to its abundant ancient literature and critical language strategy. Currently, there are several Tibetan adversarial text generation methods, but they do not fully consider the textual features of Tibetan script and overestimate the quality of generated adversarial texts. To address this issue, we propose a novel Tibetan adversarial text generation method called TSCheater, which considers the characteristic of Tibetan encoding and the feature that visually similar syllables have similar semantics. This method can also be transferred to other abugidas, such as Devanagari script. We utilize a self-constructed Tibetan syllable visual similarity database called TSVSDB to generate substitution candidates and adopt a greedy algorithm-based scoring mechanism to determine substitution order. After that, we conduct the method on eight victim language models. Experimentally, TSCheater outperforms existing methods in attack effectiveness, perturbation magnitude, semantic similarity, visual similarity, and human acceptance. Finally, we construct the first Tibetan adversarial robustness evaluation benchmark called AdvTS, which is generated by existing methods and proofread by humans.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.02371

Country: Asia > China > Tibet Autonomous Region (0.47)

Genre: Research Report (0.83)

Industry: Government > Military (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Human-in-the-Loop Generation of Adversarial Texts: A Case Study on Tibetan Script

Cao, Xi, Sun, Yuan, Li, Jiajun, Gesang, Quzong, Qun, Nuo, Nyima, Tashi

arXiv.org Artificial IntelligenceDec-16-2024

DNN-based language models perform excellently on various tasks, but even SOTA LLMs are susceptible to textual adversarial attacks. Adversarial texts play crucial roles in multiple subfields of NLP. However, current research has the following issues. (1) Most textual adversarial attack methods target rich-resourced languages. How do we generate adversarial texts for less-studied languages? (2) Most textual adversarial attack methods are prone to generating invalid or ambiguous adversarial texts. How do we construct high-quality adversarial robustness benchmarks? (3) New language models may be immune to part of previously generated adversarial texts. How do we update adversarial robustness benchmarks? To address the above issues, we introduce HITL-GAT, a system based on a general approach to human-in-the-loop generation of adversarial texts. HITL-GAT contains four stages in one pipeline: victim model construction, adversarial example generation, high-quality benchmark construction, and adversarial robustness evaluation. Additionally, we utilize HITL-GAT to make a case study on Tibetan script which can be a reference for the adversarial research of other less-studied languages.

adversarial text, artificial intelligence, natural language, (14 more...)

arXiv.org Artificial Intelligence

2412.12478

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(20 more...)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (0.78)
Government > Military (0.78)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)

Add feedback

SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments

Cao, Yue, Xing, Yun, Zhang, Jie, Lin, Di, Zhang, Tianwei, Tsang, Ivor, Liu, Yang, Guo, Qing

arXiv.org Artificial IntelligenceNov-28-2024

Large vision-language models (LVLMs) have shown remarkable capabilities in interpreting visual content. While existing works demonstrate these models' vulnerability to deliberately placed adversarial texts, such texts are often easily identifiable as anomalous. In this paper, we present the first approach to generate scene-coherent typographic adversarial attacks that mislead advanced LVLMs while maintaining visual naturalness through the capability of the LLM-based agent. Our approach addresses three critical questions: what adversarial text to generate, where to place it within the scene, and how to integrate it seamlessly. We propose a training-free, multi-modal LLM-driven scene-coherent typographic adversarial planning (SceneTAP) that employs a three-stage process: scene understanding, adversarial planning, and seamless integration. The SceneTAP utilizes chain-of-thought reasoning to comprehend the scene, formulate effective adversarial text, strategically plan its placement, and provide detailed instructions for natural integration within the image. This is followed by a scene-coherent TextDiffuser that executes the attack using a local diffusion mechanism. We extend our method to real-world scenarios by printing and placing generated patches in physical environments, demonstrating its practical implications. Extensive experiments show that our scene-coherent adversarial text successfully misleads state-of-the-art LVLMs, including ChatGPT-4o, even after capturing new images of physical setups. Our evaluations demonstrate a significant increase in attack success rates while maintaining visual naturalness and contextual appropriateness. This work highlights vulnerabilities in current vision-language models to sophisticated, scene-coherent adversarial attacks and provides insights into potential defense mechanisms.

adversarial text, placement, scenetap, (15 more...)

arXiv.org Artificial Intelligence

2412.00114

Country:

North America > Canada > Alberta (0.14)
Asia > Singapore (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback