stealthy
AutoBackdoor: Automating Backdoor Attacks via LLM Agents
Li, Yige, Li, Zhe, Zhao, Wei, Min, Nay Myat, Huang, Hanxun, Ma, Xingjun, Sun, Jun
Backdoor attacks pose a serious threat to the secure deployment of large language models (LLMs), enabling adversaries to implant hidden behaviors triggered by specific inputs. However, existing methods often rely on manually crafted triggers and static data pipelines, which are rigid, labor-intensive, and inadequate for systematically evaluating modern defense robustness. As AI agents become increasingly capable, there is a growing need for more rigorous, diverse, and scalable \textit{red-teaming frameworks} that can realistically simulate backdoor threats and assess model resilience under adversarial conditions. In this work, we introduce \textsc{AutoBackdoor}, a general framework for automating backdoor injection, encompassing trigger generation, poisoned data construction, and model fine-tuning via an autonomous agent-driven pipeline. Unlike prior approaches, AutoBackdoor uses a powerful language model agent to generate semantically coherent, context-aware trigger phrases, enabling scalable poisoning across arbitrary topics with minimal human effort. We evaluate AutoBackdoor under three realistic threat scenarios, including \textit{Bias Recommendation}, \textit{Hallucination Injection}, and \textit{Peer Review Manipulation}, to simulate a broad range of attacks. Experiments on both open-source and commercial models, including LLaMA-3, Mistral, Qwen, and GPT-4o, demonstrate that our method achieves over 90\% attack success with only a small number of poisoned samples. More importantly, we find that existing defenses often fail to mitigate these attacks, underscoring the need for more rigorous and adaptive evaluation techniques against agent-driven threats as explored in this work. All code, datasets, and experimental configurations will be merged into our primary repository at https://github.com/bboylyg/BackdoorLLM.
How stealthy is stealthy? Studying the Efficacy of Black-Box Adversarial Attacks in the Real World
Panebianco, Francesco, D'Onghia, Mario, Carminati, Stefano Zanero aand Michele
Deep learning systems, critical in domains like autonomous vehicles, are vulnerable to adversarial examples (crafted inputs designed to mislead classifiers). This study investigates black-box adversarial attacks in computer vision. This is a realistic scenario, where attackers have query-only access to the target model. Three properties are introduced to evaluate attack feasibility: robustness to compression, stealthiness to automatic detection, and stealthiness to human inspection. State-of-the-Art methods tend to prioritize one criterion at the expense of others. We propose ECLIPSE, a novel attack method employing Gaussian blurring on sampled gradients and a local surrogate model.
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Italy > Lombardy > Milan (0.04)
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
Shen, Xinyue, Wu, Yixin, Qu, Yiting, Backes, Michael, Zannettou, Savvas, Zhang, Yang
Large Language Models (LLMs) have raised increasing concerns about their misuse in generating hate speech. Among all the efforts to address this issue, hate speech detectors play a crucial role. However, the effectiveness of different detectors against LLM-generated hate speech remains largely unknown. In this paper, we propose HateBench, a framework for benchmarking hate speech detectors on LLM-generated hate speech. We first construct a hate speech dataset of 7,838 samples generated by six widely-used LLMs covering 34 identity groups, with meticulous annotations by three labelers. We then assess the effectiveness of eight representative hate speech detectors on the LLM-generated dataset. Our results show that while detectors are generally effective in identifying LLM-generated hate speech, their performance degrades with newer versions of LLMs. We also reveal the potential of LLM-driven hate campaigns, a new threat that LLMs bring to the field of hate speech detection. By leveraging advanced techniques like adversarial attacks and model stealing attacks, the adversary can intentionally evade the detector and automate hate campaigns online. The most potent adversarial attack achieves an attack success rate of 0.966, and its attack efficiency can be further improved by $13-21\times$ through model stealing attacks with acceptable attack performance. We hope our study can serve as a call to action for the research community and platform moderators to fortify defenses against these emerging threats.
- North America > United States > Alaska (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Asia > China (0.04)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
DiPmark: A Stealthy, Efficient and Resilient Watermark for Large Language Models
Wu, Yihan, Hu, Zhengmian, Zhang, Hongyang, Huang, Heng
Watermarking techniques offer a promising way to secure data via embedding covert information into the data. A paramount challenge in the domain lies in preserving the distribution of original data during watermarking. Our research extends and refines existing watermarking framework, placing emphasis on the importance of a distribution-preserving (DiP) watermark. Contrary to the current strategies, our proposed DiPmark preserves the original token distribution during watermarking (stealthy), is detectable without access to the language model API or weights (efficient), and is robust to moderate changes of tokens (resilient). This is achieved by incorporating a novel reweight strategy, combined with a hash function that assigns unique \textit{i.i.d.} ciphers based on the context. The empirical benchmarks of our approach underscore its stealthiness, efficiency, and resilience, making it a robust solution for watermarking tasks that demand impeccable quality preservation.
Backdooring Neural Code Search
Sun, Weisong, Chen, Yuchen, Tao, Guanhong, Fang, Chunrong, Zhang, Xiangyu, Zhang, Quanjun, Luo, Bin
Reusing off-the-shelf code snippets from online repositories is a common practice, which significantly enhances the productivity of software developers. To find desired code snippets, developers resort to code search engines through natural language queries. Neural code search models are hence behind many such engines. These models are based on deep learning and gain substantial attention due to their impressive performance. However, the security aspect of these models is rarely studied. Particularly, an adversary can inject a backdoor in neural code search models, which return buggy or even vulnerable code with security/privacy issues. This may impact the downstream software (e.g., stock trading systems and autonomous driving) and cause financial loss and/or life-threatening incidents. In this paper, we demonstrate such attacks are feasible and can be quite stealthy. By simply modifying one variable/function name, the attacker can make buggy/vulnerable code rank in the top 11%. Our attack BADCODE features a special trigger generation and injection procedure, making the attack more effective and stealthy. The evaluation is conducted on two neural code search models and the results show our attack outperforms baselines by 60%. Our user study demonstrates that our attack is more stealthy than the baseline by two times based on the F1 score.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
- (14 more...)
- Research Report > New Finding (0.66)
- Research Report > Experimental Study (0.46)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Stealthy Perception-based Attacks on Unmanned Aerial Vehicles
Khazraei, Amir, Meng, Haocheng, Pajic, Miroslav
In this work, we study vulnerability of unmanned aerial vehicles (UAVs) to stealthy attacks on perception-based control. To guide our analysis, we consider two specific missions: ($i$) ground vehicle tracking (GVT), and ($ii$) vertical take-off and landing (VTOL) of a quadcopter on a moving ground vehicle. Specifically, we introduce a method to consistently attack both the sensors measurements and camera images over time, in order to cause control performance degradation (e.g., by failing the mission) while remaining stealthy (i.e., undetected by the deployed anomaly detector). Unlike existing attacks that mainly rely on vulnerability of deep neural networks to small input perturbations (e.g., by adding small patches and/or noise to the images), we show that stealthy yet effective attacks can be designed by changing images of the ground vehicle's landing markers as well as suitably falsifying sensing data. We illustrate the effectiveness of our attacks in Gazebo 3D robotics simulator.
- North America > United States > North Carolina > Durham County > Durham (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
- Information Technology > Security & Privacy (1.00)
- Government > Military (1.00)
- Aerospace & Defense > Aircraft (0.91)
- Information Technology > Robotics & Automation (0.90)
Hide and Seek: on the Stealthiness of Attacks against Deep Learning Systems
Liu, Zeyan, Li, Fengjun, Lin, Jingqiang, Li, Zhu, Luo, Bo
With the growing popularity of artificial intelligence (AI) and machine learning (ML), a wide spectrum of attacks against deep learning (DL) models have been proposed in the literature. Both the evasion attacks and the poisoning attacks attempt to utilize adversarially altered samples to fool the victim model to misclassify the adversarial sample. While such attacks claim to be or are expected to be stealthy, i.e., imperceptible to human eyes, such claims are rarely evaluated. In this paper, we present the first large-scale study on the stealthiness of adversarial samples used in the attacks against deep learning. We have implemented 20 representative adversarial ML attacks on six popular benchmarking datasets. We evaluate the stealthiness of the attack samples using two complementary approaches: (1) a numerical study that adopts 24 metrics for image similarity or quality assessment; and (2) a user study of 3 sets of questionnaires that has collected 30,000+ annotations from 1,500+ responses. Our results show that the majority of the existing attacks introduce non-negligible perturbations that are not stealthy to human eyes. We further analyze the factors that contribute to attack stealthiness. We examine the correlation between the numerical analysis and the user studies, and demonstrate that some image quality metrics may provide useful guidance in attack designs, while there is still a significant gap between assessed image quality and visual stealthiness of attacks.
- North America > United States > Missouri > Jackson County > Kansas City (0.14)
- North America > United States > Kansas > Douglas County > Lawrence (0.14)
- North America > United States > Kansas > Wyandotte County > Kansas City (0.04)
- Asia > China > Anhui Province > Hefei (0.04)
- Research Report > New Finding (1.00)
- Questionnaire & Opinion Survey (1.00)
Poison Ink: A Stealthy, Robust, General, Invisible and Flexible Backdoor Attack Method
While the progress and power of deep neural networks (DNNs) have accelerated the development of applications such as facial and object recognition, DNNs are known to be vulnerable to a variety of attack strategies. One of the most cunning is backdoor attacks, which can corrupt a training dataset and cause DNNs to produce consistent and repeated misclassifications on inputs marked with a specific "trigger" pattern. The danger posed by backdoor attacks has raised concerns in both academia and industry, even though most existing backdoor attack methods are often either visible or fragile to preprocessing defence procedures. In a new paper, a research team from the University of Science and Technology of China, Microsoft Cloud AI, City University of Hong Kong and Wormpex AI Research ramps up the power of backdoor attacks, introducing "Poison Ink," a robust and invisible method that is resistant to many state-of-the-art defence techniques. The team's goals were to enable Poison Ink to maintain model performance on clean data, produce imperceptibly poisoned images that evade human inspection at the inference stage, and maintain high attack effectiveness even if the poisoned images are preprocessed via data transformations.
Hypersonic missiles may be unstoppable. Is society ready?
Hypersonic represents a new frontier of missile warfare: fast, stealthy, and unpredictable in flight. The U.S. recently tested a prototype that puts it in a race with China and Russia to claim a capability that adds another layer of uncertainty to geopolitical competition, not least because of the complex computational systems on which hypersonic weapons rely. Put simply, the assumptions of conventional missile warfare – that incoming attacks can be tracked and intercepted, and a proportionate response be weighed – don't transfer easily to hypersonic weapons because they are so fast and stealthy. That means a greater reliance on artificial intelligence to track and respond, raising ethical questions about how such systems are programmed. Even if it's not all dictated by AI, "there is going to be an awful lot of automation and that kind of decision chain to deal with these kinds of systems," says Douglas Barrie, a military aerospace analyst in London.
- North America > United States (0.40)
- Europe > Russia (0.29)
- Asia > Russia (0.29)
- Asia > China (0.29)
AI cyberattacks will be almost impossible for humans to stop
As early as 2018, we can expect to see truly autonomous weaponised artificial intelligence that delivers its blows slowly, stealthily and virtually without trace. And 2018 will be the year of the machine-on-machine attack. There is much debate about the possible future of autonomous AI on the battlefield. Once released, these systems are not controlled. They do not wait for orders from base.
- Information Technology (0.87)
- Government > Military > Cyberwarfare (0.40)