stealth
Hiding in the AI Traffic: Abusing MCP for LLM-Powered Agentic Red Teaming
Janjusevic, Strahinja, Garcia, Anna Baron, Kazerounian, Sohrob
Generative AI is reshaping offensive cybersecurity by enabling autonomous red team agents that can plan, execute, and adapt during penetration tests. However, existing approaches face trade-offs between generality and specialization, and practical deployments reveal challenges such as hallucinations, context limitations, and ethical concerns. In this work, we introduce a novel command & control (C2) architecture leveraging the Model Context Protocol (MCP) to coordinate distributed, adaptive reconnaissance agents covertly across networks. Notably, we find that our architecture not only improves goal-directed behavior of the system as whole, but also eliminates key host and network artifacts that can be used to detect and prevent command & control behavior altogether. We begin with a comprehensive review of state-of-the-art generative red teaming methods, from fine-tuned specialist models to modular or agentic frameworks, analyzing their automation capabilities against task-specific accuracy. We then detail how our MCP-based C2 can overcome current limitations by enabling asynchronous, parallel operations and real-time intelligence sharing without periodic beaconing. We furthermore explore advanced adversarial capabilities of this architecture, its detection-evasion techniques, and address dual-use ethical implications, proposing defensive measures and controlled evaluation in lab settings. Experimental comparisons with traditional C2 show drastic reductions in manual effort and detection footprint. We conclude with future directions for integrating autonomous exploitation, defensive LLM agents, predictive evasive maneuvers, and multi-agent swarms. The proposed MCP-enabled C2 framework demonstrates a significant step toward realistic, AI-driven red team operations that can simulate advanced persistent threats while informing the development of next-generation defensive systems.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States > California > Santa Clara County > San Jose (0.04)
- Overview (1.00)
- Research Report > New Finding (0.92)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (0.92)
- Government > Military > Cyberwarfare (0.66)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)
NetGent: Agent-Based Automation of Network Application Workflows
Daneshamooz, Jaber, Vuong, Eugene, Koduru, Laasya, Chandrasekaran, Sanjay, Gupta, Arpit
We present NetGent, an AI-agent framework for automating complex application workflows to generate realistic network traffic datasets. Developing generalizable ML models for networking requires data collection from network environments with traffic that results from a diverse set of real-world web applications. However, using existing browser automation tools that are diverse, repeatable, realistic, and efficient remains fragile and costly. NetGent addresses this challenge by allowing users to specify workflows as natural-language rules that define state-dependent actions. These abstract specifications are compiled into nondeterministic finite automata (NFAs), which a state synthesis component translates into reusable, executable code. This design enables deterministic replay, reduces redundant LLM calls through state caching, and adapts quickly when application interfaces change. In experiments, NetGent automated more than 50+ workflows spanning video-on-demand streaming, live video streaming, video conferencing, social media, and web scraping, producing realistic traffic traces while remaining robust to UI variability. By combining the flexibility of language-based agents with the reliability of compiled execution, NetGent provides a scalable foundation for generating the diverse, repeatable datasets needed to advance ML in networking.
- Media (0.99)
- Information Technology > Security & Privacy (0.68)
- Government > Regional Government > North America Government > United States Government (0.46)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.84)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
BadThink: Triggered Overthinking Attacks on Chain-of-Thought Reasoning in Large Language Models
Liu, Shuaitong, Li, Renjue, Yu, Lijia, Zhang, Lijun, Liu, Zhiming, Jin, Gaojie
Recent advances in Chain-of-Thought (CoT) prompting have substantially improved the reasoning capabilities of large language models (LLMs), but have also introduced their computational efficiency as a new attack surface. In this paper, we propose BadThink, the first backdoor attack designed to deliberately induce "overthinking" behavior in CoT-enabled LLMs while ensuring stealth. When activated by carefully crafted trigger prompts, BadThink manipulates the model to generate inflated reasoning traces - producing unnecessarily redundant thought processes while preserving the consistency of final outputs. This subtle attack vector creates a covert form of performance degradation that significantly increases computational costs and inference time while remaining difficult to detect through conventional output evaluation methods. We implement this attack through a sophisticated poisoning-based fine-tuning strategy, employing a novel LLM-based iterative optimization process to embed the behavior by generating highly naturalistic poisoned data. Our experiments on multiple state-of-the-art models and reasoning tasks show that BadThink consistently increases reasoning trace lengths - achieving an over 17x increase on the MATH-500 dataset - while remaining stealthy and robust. This work reveals a critical, previously unexplored vulnerability where reasoning efficiency can be covertly manipulated, demonstrating a new class of sophisticated attacks against CoT-enabled systems.
Unveiling Hidden Threats: Using Fractal Triggers to Boost Stealthiness of Distributed Backdoor Attacks in Federated Learning
Wang, Jian, Shen, Hong, Lam, Chan-Tong
Traditional distributed backdoor attacks (DBA) in federated learning improve stealthiness by decomposing global triggers into sub-triggers, which however requires more poisoned data to maintian the attck strength and hence increases the exposure risk. To overcome this defect, This paper proposes a novel method, namely Fractal-Triggerred Distributed Backdoor Attack (FTDBA), which leverages the self-similarity of fractals to enhance the feature strength of sub-triggers and hence significantly reduce the required poisoning volume for the same attack strength. To address the detectability of fractal structures in the frequency and gradient domains, we introduce a dynamic angular perturbation mechanism that adaptively adjusts perturbation intensity across the training phases to balance efficiency and stealthiness. Experiments show that FTDBA achieves a 92.3\% attack success rate with only 62.4\% of the poisoning volume required by traditional DBA methods, while reducing the detection rate by 22.8\% and KL divergence by 41.2\%. This study presents a low-exposure, high-efficiency paradigm for federated backdoor attacks and expands the application of fractal features in adversarial sample generation.
- Asia > Macao (0.05)
- Oceania > Australia > Queensland (0.04)
- Asia > China (0.04)
- North America > United States > Indiana (0.04)
FLAT: Latent-Driven Arbitrary-Target Backdoor Attacks in Federated Learning
Nguyen, Tuan, Doan, Khoa D, Wong, Kok-Seng
Federated learning (FL) is vulnerable to backdoor attacks, yet most existing methods are limited by fixed-pattern or single-target triggers, making them inflexible and easier to detect. We propose FLA T (FL Arbitrary-Target Attack), a novel backdoor attack that leverages a latent-driven conditional autoencoder to generate diverse, target-specific triggers as needed. By introducing a latent code, FLA T enables the creation of visually adaptive and highly variable triggers, allowing attackers to select arbitrary targets without retraining and to evade conventional detection mechanisms. Our approach unifies attack success, stealth, and diversity within a single framework, introducing a new level of flexibility and sophistication to backdoor attacks in FL. Extensive experiments show that FLA T achieves high attack success and remains robust against advanced FL defenses. These results highlight the urgent need for new defense strategies to address latent-driven, multi-target backdoor threats in federated settings.
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Vietnam > Hanoi > Hanoi (0.04)
- North America > United States > Illinois (0.04)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
Through the Stealth Lens: Rethinking Attacks and Defenses in RAG
Choudhary, Sarthak, Palumbo, Nils, Hooda, Ashish, Dvijotham, Krishnamurthy Dj, Jha, Somesh
Retrieval-augmented generation (RAG) systems are vulnerable to attacks that inject poisoned passages into the retrieved set, even at low corruption rates. We show that existing attacks are not designed to be stealthy, allowing reliable detection and mitigation. We formalize stealth using a distinguishability-based security game. If a few poisoned passages are designed to control the response, they must differentiate themselves from benign ones, inherently compromising stealth. This motivates the need for attackers to rigorously analyze intermediate signals involved in generation$\unicode{x2014}$such as attention patterns or next-token probability distributions$\unicode{x2014}$to avoid easily detectable traces of manipulation. Leveraging attention patterns, we propose a passage-level score$\unicode{x2014}$the Normalized Passage Attention Score$\unicode{x2014}$used by our Attention-Variance Filter algorithm to identify and filter potentially poisoned passages. This method mitigates existing attacks, improving accuracy by up to $\sim 20 \%$ over baseline defenses. To probe the limits of attention-based defenses, we craft stealthier adaptive attacks that obscure such traces, achieving up to $35 \%$ attack success rate, and highlight the challenges in improving stealth.
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Middle East > Jordan (0.04)
When Safety Detectors Aren't Enough: A Stealthy and Effective Jailbreak Attack on LLMs via Steganographic Techniques
Geng, Jianing, Yi, Biao, Fei, Zekun, Wu, Tongxi, Nie, Lihai, Liu, Zheli
Jailbreak attacks pose a serious threat to large language models (LLMs) by bypassing built-in safety mechanisms and leading to harmful outputs. Studying these attacks is crucial for identifying vulnerabilities and improving model security. This paper presents a systematic survey of jailbreak methods from the novel perspective of stealth. We find that existing attacks struggle to simultaneously achieve toxic stealth (concealing toxic content) and linguistic stealth (maintaining linguistic naturalness). Motivated by this, we propose StegoAttack, a fully stealthy jailbreak attack that uses steganography to hide the harmful query within benign, semantically coherent text. The attack then prompts the LLM to extract the hidden query and respond in an encrypted manner. This approach effectively hides malicious intent while preserving naturalness, allowing it to evade both built-in and external safety mechanisms. We evaluate StegoAttack on four safety-aligned LLMs from major providers, benchmarking against eight state-of-the-art methods. StegoAttack achieves an average attack success rate (ASR) of 92.00%, outperforming the strongest baseline by 11.0%. Its ASR drops by less than 1% even under external detection (e.g., Llama Guard). Moreover, it attains the optimal comprehensive scores on stealth detection metrics, demonstrating both high efficacy and exceptional stealth capabilities. The code is available at https://anonymous.4open.science/r/StegoAttack-Jail66
Residual-Evasive Attacks on ADMM in Distributed Optimization
Bruckmeier, Sabrina, Mo, Huadong, Qin, James
This paper presents two attack strategies designed to evade detection in ADMM-based systems by preventing significant changes to the residual during the attacked iteration. While many detection algorithms focus on identifying false data injection through residual changes, we show that our attacks remain undetected by keeping the residual largely unchanged. The first strategy uses a random starting point combined with Gram-Schmidt orthogonalization to ensure stealth, with potential for refinement by enhancing the orthogonal component to increase system disruption. The second strategy builds on the first, targeting financial gains by manipulating reactive power and pushing the system to its upper voltage limit, exploiting operational constraints. The effectiveness of the proposed attack-resilient mechanism is demonstrated through case studies on the IEEE 14-bus system. A comparison of the two strategies, along with commonly used naive attacks, reveals trade-offs between simplicity, detectability, and effectiveness, providing insights into ADMM system vulnerabilities. These findings underscore the need for more robust monitoring algorithms to protect against advanced attack strategies.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Oceania > Australia > New South Wales (0.04)
- North America > United States > Illinois (0.04)
- (4 more...)
- Information Technology > Security & Privacy (1.00)
- Energy > Power Industry (1.00)
- Government > Military > Cyberwarfare (0.47)
Atomfall, the survival game that draws from classic British sci-fi
The year is 1962 and you've just woken up in the shadow of the Windscale (now Sellafield) nuclear power station in Cumbria, five years after its catastrophic meltdown. Trapped in the sizeable quarantine zone surrounding the accident site, you must stay alive long enough to figure out how to escape – a task made rather more challenging by the presence of aggressive cultists, irradiated monsters and highly territorial terror bees. Imagine Stalker, but set in northern England, and you're edging towards what Oxford-based developer Rebellion has in store. Fallout may seem like another obvious inspiration for this irradiated game world, but after playing a two-hour demo, it's clear the game draws more from classic British sci-fi. Here you are, stuck in the picturesque Lake District, with its lush woodlands, gurgling rivers and dry-stone walls.
- Energy > Power Industry > Utilities > Nuclear (0.56)
- Leisure & Entertainment > Games > Computer Games (0.40)
Attention Masks Help Adversarial Attacks to Bypass Safety Detectors
Despite recent research advancements in adversarial attack methods, current approaches against XAI monitors are still discoverable and slower. In this paper, we present an adaptive framework for attention mask generation to enable stealthy, explainable and efficient PGD image classification adversarial attack under XAI monitors. Specifically, we utilize mutation XAI mixture and multitask self-supervised X-UNet for attention mask generation to guide PGD attack. Experiments on MNIST (MLP), CIFAR-10 (AlexNet) have shown that our system can outperform benchmark PGD, Sparsefool and SOTA SINIFGSM in balancing among stealth, efficiency and explainability which is crucial for effectively fooling SOTA defense protected classifiers.
- Oceania > Australia > Western Australia > Perth (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Information Technology > Security & Privacy (0.90)
- Government > Military (0.80)