Goto

Collaborating Authors

 attack



Attack of the Tails: Yes, You Really Can Backdoor Federated Learning

Neural Information Processing Systems

Due to its decentralized nature, Federated Learning (FL) lends itself to adversarial attacks in the form of backdoors during training. The goal of a backdoor is to corrupt the performance of the trained model on specific sub-tasks (e.g., by classifying green cars as frogs). A range of FL backdoor attacks have been introduced in the literature, but also methods to defend against them, and it is currently an open question whether FL systems can be tailored to be robust against backdoors. In this work, we provide evidence to the contrary. We first establish that, in the general case, robustness to backdoors implies model robustness to adversarial examples, a major open problem in itself. Furthermore, detecting the presence of a backdoor in a FL model is unlikely assuming first-order oracles or polynomial time.


Adversarial Attack on Attackers: Post-Process to Mitigate Black-Box Score-Based Query Attacks

Neural Information Processing Systems

The score-based query attacks (SQAs) pose practical threats to deep neural networks by crafting adversarial perturbations within dozens of queries, only using the model's output scores. Nonetheless, we note that if the loss trend of the outputs is slightly perturbed, SQAs could be easily misled and thereby become much less effective. Following this idea, we propose a novel defense, namely Adversarial Attack on Attackers (AAA), to confound SQAs towards incorrect attack directions by slightly modifying the output logits. In this way, (1) SQAs are prevented regardless of the model's worst-case robustness; (2) the original model predictions are hardly changed, i.e., no degradation on clean accuracy; (3) the calibration of confidence scores can be improved simultaneously. Extensive experiments are provided to verify the above advantages. For example, by setting $\ell_\infty=8/255$ on CIFAR-10, our proposed AAA helps WideResNet-28 secure 80.59% accuracy under Square attack (2500 queries), while the best prior defense (i.e., adversarial training) only attains 67.44%. Since AAA attacks SQA's general greedy strategy, such advantages of AAA over 8 defenses can be consistently observed on 8 CIFAR-10/ImageNet models under 6 SQAs, using different attack targets, bounds, norms, losses, and strategies.



This AI-Powered Robot Keeps Going Even if You Attack It With a Chainsaw

WIRED

A single AI model trained to control numerous robotic bodies can operate unfamiliar hardware and adapt eerily well to serious injuries. A four-legged robot that keeps crawling even after all four of its legs have been hacked off with a chainsaw is the stuff of nightmares for most people. For Deepak Pathak, cofounder and CEO of the startup Skild AI, the dystopian feat of adaptation is an encouraging sign of a new, more general kind of robotic intelligence. "This is something we call an omni-bodied brain," Pathak tells me. His startup developed the generalist artificial intelligence algorithm to address a key challenge with advancing robotics: "Any robot, any task, one brain.


Ask, Attend, Attack: An Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models

Neural Information Processing Systems

While image-to-text models have demonstrated significant advancements in various vision-language tasks, they remain susceptible to adversarial attacks. Existing white-box attacks on image-to-text models require access to the architecture, gradients, and parameters of the target model, resulting in low practicality. Although the recently proposed gray-box attacks have improved practicality, they suffer from semantic loss during the training process, which limits their targeted attack performance. To advance adversarial attacks of image-to-text models, this paper focuses on a challenging scenario: decision-based black-box targeted attacks where the attackers only have access to the final output text and aim to perform targeted attacks. Specifically, we formulate the decision-based black-box targeted attack as a large-scale optimization problem.


Policy Teaching via Data Poisoning in Learning from Human Preferences

Nika, Andi, Nöther, Jonathan, Mandal, Debmalya, Kamalaruban, Parameswaran, Singla, Adish, Radanović, Goran

arXiv.org Artificial Intelligence

We study data poisoning attacks in learning from human preferences. More specifically, we consider the problem of teaching/enforcing a target policy $\pi^\dagger$ by synthesizing preference data. We seek to understand the susceptibility of different preference-based learning paradigms to poisoned preference data by analyzing the number of samples required by the attacker to enforce $\pi^\dagger$. We first propose a general data poisoning formulation in learning from human preferences and then study it for two popular paradigms, namely: (a) reinforcement learning from human feedback (RLHF) that operates by learning a reward model using preferences; (b) direct preference optimization (DPO) that directly optimizes policy using preferences. We conduct a theoretical analysis of the effectiveness of data poisoning in a setting where the attacker is allowed to augment a pre-existing dataset and also study its special case where the attacker can synthesize the entire preference dataset from scratch. As our main results, we provide lower/upper bounds on the number of samples required to enforce $\pi^\dagger$. Finally, we discuss the implications of our results in terms of the susceptibility of these learning paradigms under such data poisoning attacks.


A Survey on Trustworthy LLM Agents: Threats and Countermeasures

Yu, Miao, Meng, Fanci, Zhou, Xinyun, Wang, Shilong, Mao, Junyuan, Pang, Linsey, Chen, Tianlong, Wang, Kun, Li, Xinfeng, Zhang, Yongfeng, An, Bo, Wen, Qingsong

arXiv.org Artificial Intelligence

With the rapid evolution of Large Language Models (LLMs), LLM-based agents and Multi-agent Systems (MAS) have significantly expanded the capabilities of LLM ecosystems. This evolution stems from empowering LLMs with additional modules such as memory, tools, environment, and even other agents. However, this advancement has also introduced more complex issues of trustworthiness, which previous research focused solely on LLMs could not cover. In this survey, we propose the TrustAgent framework, a comprehensive study on the trustworthiness of agents, characterized by modular taxonomy, multi-dimensional connotations, and technical implementation. By thoroughly investigating and summarizing newly emerged attacks, defenses, and evaluation methods for agents and MAS, we extend the concept of Trustworthy LLM to the emerging paradigm of Trustworthy Agent. In TrustAgent, we begin by deconstructing and introducing various components of the Agent and MAS. Then, we categorize their trustworthiness into intrinsic (brain, memory, and tool) and extrinsic (user, agent, and environment) aspects. Subsequently, we delineate the multifaceted meanings of trustworthiness and elaborate on the implementation techniques of existing research related to these internal and external modules. Finally, we present our insights and outlook on this domain, aiming to provide guidance for future endeavors.


ProAPT: Projection of APT Threats with Deep Reinforcement Learning

Dehghan, Motahareh, Sadeghiyan, Babak, Khosravian, Erfan, Moghaddam, Alireza Sedighi, Nooshi, Farshid

arXiv.org Artificial Intelligence

The highest level in the Endsley situation awareness model is called projection when the status of elements in the environment in the near future is predicted. In cybersecurity situation awareness, the projection for an Advanced Persistent Threat (APT) requires predicting the next step of the APT. The threats are constantly changing and becoming more complex. As supervised and unsupervised learning methods require APT datasets for projecting the next step of APTs, they are unable to identify unknown APT threats. In reinforcement learning methods, the agent interacts with the environment, and so it might project the next step of known and unknown APTs. So far, reinforcement learning has not been used to project the next step for APTs. In reinforcement learning, the agent uses the previous states and actions to approximate the best action of the current state. When the number of states and actions is abundant, the agent employs a neural network which is called deep learning to approximate the best action of each state. In this paper, we present a deep reinforcement learning system to project the next step of APTs. As there exists some relation between attack steps, we employ the Long- Short-Term Memory (LSTM) method to approximate the best action of each state. In our proposed system, based on the current situation, we project the next steps of APT threats.


Sensitive Iranian Military Site Was Targeted in Attack

NYT > Middle East

In early February, Israel sent six quadcopter drones containing explosives into a facility near the city of Kermanshah that was Iran's main manufacturing and storage plant for military drones, according to a senior intelligence official briefed on the operation. That Israeli attack destroyed dozens of Iran's drones. Iran retaliated by firing ballistic missiles at a housing complex in northern Iraq that it said had been used by Israeli agents to plot attacks against Iran. In June 2021, another attack using a quadcopter drone -- which explodes on impact -- was also launched from within the country. It struck the Iran Centrifuge Technology Company, or TESA, in the city of Karaj.