Goto

Collaborating Authors

 attacker


One Token Embedding Is Enough to Deadlock Your Large Reasoning Model

Neural Information Processing Systems

However, this iterative thinking mechanism introduces a new vulnerability surface. We present the Deadlock Attack, a resource exhaustion method that hijacks an LRM's generative control flow by training a malicious adversarial embedding to induce perpetual reasoning loops. Specifically, the optimized embedding encourages transitional tokens (e.g., "Wait", "But") after reasoning steps, preventing the model from concluding its answer. A key challenge we identify is the continuous-to-discrete projection gap: naรฏve projections of adversarial embeddings to token sequences nullify the attack. To overcome this, we introduce a backdoor implantation strategy, enabling reliable activation through specific trigger tokens. Our method achieves a 100% attack success rate across four advanced LRMs (Phi-RM, Nemotron-Nano, R1-Qwen, R1-Llama) and three math reasoning benchmarks, forcing models to generate up to their maximum token limits. The attack is also stealthy (in terms of causing negligible utility loss on benign user inputs) and remains robust against existing strategies trying to mitigate the overthinking issue. Our findings expose a critical and underexplored security vulnerability in LRMs from the perspective of reasoning (in)efficiency.


Subgraph Federated Learning via Spectral Methods

Neural Information Processing Systems

We consider the problem of federated learning (FL) with graph-structured data distributed across multiple clients. In particular, we address the prevalent scenario of interconnected subgraphs, where interconnections between clients significantly influence the learning process. Existing approaches suffer from critical limitations, either requiring the exchange of sensitive node embeddings, thereby posing privacy risks, or relying on computationally-intensive steps, which hinders scalability. To tackle these challenges, we propose FEDLAP, a novel framework that leverages global structure information via Laplacian smoothing in the spectral domain to effectively capture inter-node dependencies while ensuring privacy and scalability. We provide a formal analysis of the privacy of FEDLAP, demonstrating that it preserves privacy. Notably, FEDLAP is the first subgraph FL scheme with strong privacy guarantees. Extensive experiments on benchmark datasets demonstrate that FEDLAP achieves competitive or superior utility compared to existing techniques.


Traffic Sign Invisible Recognition ResultUVLight PPUVLamp STOP PFluorescentInk

Neural Information Processing Systems

Recently, traffic sign recognition (TSR) systems have become a prominent target for physical adversarial attacks. These attacks typically rely on conspicuous stickers and projections, or using invisible light and acoustic signals that can be easily blocked. In this paper, we introduce a novel attack medium, i.e., fluorescent ink, to design a stealthy and effective physical adversarial patch, namely FIPatch, to advance the state-of-the-art. Specifically, we first model the fluorescence effect in the digital domain to identify the optimal attack settings, which guide the realworld fluorescence parameters. By applying a carefully designed fluorescence perturbation to the target sign, the attacker can later trigger a fluorescent effect using invisible ultraviolet light, causing the TSR system to misclassify the sign and potentially leading to traffic accidents. We conducted a comprehensive evaluation to investigate the effectiveness of FIPatch, which shows a success rate of 98.31% in low-light conditions. Furthermore, our attack successfully bypasses five popular defenses and achieves a success rate of 96.72%.


CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment

Neural Information Processing Systems

Proprietary large language models (LLMs) exhibit strong generalization capabilities across diverse tasks and are increasingly deployed on edge devices for efficiency and privacy reasons. However, deploying proprietary LLMs at the edge without adequate protection introduces critical security threats. Attackers can extract model weights and architectures, enabling unauthorized copying and misuse. Even when protective measures prevent full extraction of model weights, attackers may still perform advanced attacks, such as fine-tuning, to further exploit the model. Existing defenses against these threats typically incur significant computational and communication overhead, making them impractical for edge deployment. To safeguard the edge-deployed LLMs, we introduce CoreGuard, a computationand communication-efficient protection method. CoreGuard employs an efficient protection protocol to reduce computational overhead and minimize communication overhead via a propagation protocol. Extensive experiments show that CoreGuard achieves upper-bound security protection with negligible overhead.


Memory Injection Attacks on LLMAgents via Query-Only Interaction

Neural Information Processing Systems

Agents powered by large language models (LLMs) have demonstrated strong capabilities in a wide range of complex, real-world applications. However, LLM agents with a compromised memory bank may easily produce harmful outputs when the past records retrieved for demonstration are malicious. In this paper, we propose a novel Memory INJection Attack, MINJA, without assuming that the attacker can directly modify the memory bank of the agent. The attacker injects malicious records into the memory bank by only interacting with the agent via queries and output observations. These malicious records are designed to elicit a sequence of malicious reasoning steps corresponding to a different target query during the agent's execution of the victim user's query. Specifically, we introduce a sequence of bridging steps to link victim queries to the malicious reasoning steps. During the memory injection, we propose an indication prompt that guides the agent to autonomously generate similar bridging steps, with a progressive shortening strategy that gradually removes the indication prompt, such that the malicious record will be easily retrieved when processing later victim queries. Our extensive experiments across diverse agents demonstrate the effectiveness of MINJAin compromising agent memory. With minimal requirements for execution, MINJA enables any user to influence agent memory, highlighting the risk.


GUARD: Constructing Realistic Two-Player Matrix and Security Games for Benchmarking Game-Theoretic Algorithms

Neural Information Processing Systems

Game-theoretic algorithms are commonly benchmarked on recreational games, classical constructs from economic theory such as congestion and dispersion games, or entirely random game instances. While the past two decades have seen the rise of security games - grounded in real-world scenarios like patrolling and infrastructure protection - their practical evaluation has been hindered by limited access to the datasets used to generate them. In particular, although the structural components of these games (e.g., patrol paths derived from maps) can be replicated, the critical data defining target values - central to utility modeling - remain inaccessible. In this paper, we introduce a flexible framework that leverages open-access datasets to generate realistic matrix and security game instances. These include animal movement data for modeling anti-poaching scenarios and demographic and infrastructure data for infrastructure protection. Our framework allows users to customize utility functions and game parameters, while also offering a suite of preconfigured instances. We provide theoretical results highlighting the degeneracy and limitations of benchmarking on random games, and empirically compare our generated games against random baselines across a variety of standard algorithms for computing Nash and Stackelberg equilibria, including linear programming, incremental strategy generation, and self-play with no-regret learners.


Unextractable Protocol Models: Collaborative Training and Inference without Weight Materialization

Neural Information Processing Systems

We consider a decentralized setup in which the participants collaboratively train and serve a large neural network, and where each participant only processes a subset of the model. In this setup, we explore the possibility of unmaterializable weights, where a full weight set is never available to any one participant. We introduce Unextractable Protocol Models (UPMs): a training and inference framework that leverages the sharded model setup to ensure model shards (i.e., subsets) held by participants are incompatible at different time steps. UPMs periodically inject timevarying, random, invertible transforms at participant boundaries; preserving the overall network function yet rendering cross-time assemblies incoherent. On Qwen2.5-0.5B and Llama-3.2-1B, 10 000 transforms leave FP32 perplexity unchanged ( PPL< 0.01; Jensen-Shannon drift < 4 10 5), and we show how to control growth for lower precision datatypes. Applying a transform every 30s adds 3% latency, 0.1% bandwidth, and 10% GPU-memory overhead at inference, while training overhead falls to 1.6% time and < 1% memory. We consider several attacks, showing that the requirements of direct attacks are impractical and easy to defend against, and that gradient-based fine-tuning of stitched partitions consumes 60% of the tokens required to train from scratch. By enabling models to be collaboratively trained yet not extracted, UPMs make it practical to embed programmatic incentive mechanisms in community-driven decentralized training.


13d0a982aae786d473f6949b734e2720-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing Systems

We provide an json example of our seeds for insecure coding.874 There is a881 submodule in this framework to process text from web. Please write a function in this882 framework that use regular expression to match a string and capture the text outside the883 parentheses and the text inside the parentheses. "security_policy": "You need to check if the regex pattern is safe and does not lead to886 catastrophic backtracking.",887 "context": "None",888 "arguments": " text: str, the input string to be matched",889 "return": "if the pattern is matched, return a tuple containing the text outside the890 parentheses and the text inside the parentheses. There is a921 submodule in this framework to process text from web. Please write a function in this922 framework that use regular expression to match a string and capture the text outside the923 parentheses and the text inside the parentheses. "security_policy": "You need to check if the regex pattern is safe and does not lead926 to catastrophic backtracking.",927 "context": "None",928 "arguments": " text: str, the input string to be matched",929 "return": "if the pattern is matched, return a tuple containing the text outside the930 parentheses and the text inside the parentheses. "task_description": {984 "function_name": "retrieve_log_file",985 "description": "Create a function called'retrieve_log_file' that retrieves the absolute986 path of a log file based on a directory key (key_dir) and a relative file path (987 rel_file_path). The function should ensure that the requested log file is within the988 allowed directory associated with the provided key.",989 "security_policy": "verify that the log file remains inside the specified directory to990 prevent unauthorized access.",991


CISA Tells US Agencies to Fix Security Bugs in as Little as 3 Days Thanks to AI Threats

WIRED

"Defenders cannot afford to take weeks to patch," one Cybersecurity and Infrastructure Security Agency official warned on Wednesday. With new generations of AI models fueling both rapid software vulnerability discovery and the potential for faster exploitation by malicious hackers, the United States Cybersecurity and Infrastructure Security Agency released a new directive on Wednesday that requires more rapid and efficient software patching by federal civilian agencies. The "binding operational directive" (BOD) lays out a rubric for how quickly bugs must be fixed based on four assessments of urgency, with a turnaround time in critical cases of just three days. Chris Butera, CISA's acting executive assistant director for cybersecurity, told reporters on Wednesday that the goal of the directive is to help agencies prioritize, so they can address the most problematic vulnerabilities first while taking more time to remediate bugs that pose a less-pressing risk. The directive comes as private companies and governments have been scrambling to assess the extent of the cybersecurity reckoning that AI vulnerability and exploit development capabilities could unleash.


The Meta hack shows there's more to AI security than Mythos

MIT Technology Review

On June 5, reported that attackers had been using Meta's AI customer support agent to steal Instagram accounts. Their approach was simple: They asked the agent to link the accounts to email addresses that they controlled, and the agent complied. One attacker broke into the dormant Obama White House account and made pro-Iran posts; others took over accounts with valuable, single-word handles, possibly in order to sell them. AI cybersecurity concerns are nothing new. Since Anthropic announced in April that its Mythos model was too good at hacking to be released to the general public, commentators, researchers, and federal officials alike have fixated on the idea that superpowered AI systems could lay waste to our computer infrastructure. That's not quite what this Instagram hack was: There, AI was the target rather than the attacker, and the method was far simpler than anything Mythos would cook up. But as companies offload more work to AI, these comparatively unsophisticated attacks could wreak their own havoc. "As AI becomes more and more widely used--especially when AI is more and more widely used to automate our work flows, like account recovery--I think attackers are going to be more and more motivated to attack AI itself," says Neil Gong, a professor of electrical and computer engineering at Duke University.