Goto

Collaborating Authors

 Pang, Lu


Long-Tailed Backdoor Attack Using Dynamic Data Augmentation Operations

arXiv.org Artificial Intelligence

Recently, backdoor attack has become an increasing security threat to deep neural networks and drawn the attention of researchers. Backdoor attacks exploit vulnerabilities in third-party pretrained models during the training phase, enabling them to behave normally for clean samples and mispredict for samples with specific triggers. Existing backdoor attacks mainly focus on balanced datasets. However, real-world datasets often follow long-tailed distributions. In this paper, for the first time, we explore backdoor attack on such datasets. Specifically, we first analyze the influence of data imbalance on backdoor attack. Based on our analysis, we propose an effective backdoor attack named Dynamic Data Augmentation Operation (D$^2$AO). We design D$^2$AO selectors to select operations depending jointly on the class, sample type (clean vs. backdoored) and sample features. Meanwhile, we develop a trigger generator to generate sample-specific triggers. Through simultaneous optimization of the backdoored model and trigger generator, guided by dynamic data augmentation operation selectors, we achieve significant advancements. Extensive experiments demonstrate that our method can achieve the state-of-the-art attack performance while preserving the clean accuracy.


Task-Agnostic Detector for Insertion-Based Backdoor Attacks

arXiv.org Artificial Intelligence

Textual backdoor attacks pose significant security threats. Current detection approaches, typically relying on intermediate feature representation or reconstructing potential triggers, are task-specific and less effective beyond sentence classification, struggling with tasks like question answering and named entity recognition. We introduce TABDet (Task-Agnostic Backdoor Detector), a pioneering task-agnostic method for backdoor detection. TABDet leverages final layer logits combined with an efficient pooling technique, enabling unified logit representation across three prominent NLP tasks. TABDet can jointly learn from diverse task-specific models, demonstrating superior detection efficacy over traditional task-specific methods.


Attention-Enhancing Backdoor Attacks Against BERT-based Models

arXiv.org Artificial Intelligence

Recent studies have revealed that \textit{Backdoor Attacks} can threaten the safety of natural language processing (NLP) models. Investigating the strategies of backdoor attacks will help to understand the model's vulnerability. Most existing textual backdoor attacks focus on generating stealthy triggers or modifying model weights. In this paper, we directly target the interior structure of neural networks and the backdoor mechanism. We propose a novel Trojan Attention Loss (TAL), which enhances the Trojan behavior by directly manipulating the attention patterns. Our loss can be applied to different attacking methods to boost their attack efficacy in terms of attack successful rates and poisoning rates. It applies to not only traditional dirty-label attacks, but also the more challenging clean-label attacks. We validate our method on different backbone models (BERT, RoBERTa, and DistilBERT) and various tasks (Sentiment Analysis, Toxic Detection, and Topic Classification).


Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder

arXiv.org Artificial Intelligence

Deep neural networks are vulnerable to backdoor attacks, where an adversary maliciously manipulates the model behavior through overlaying images with special triggers. Existing backdoor defense methods often require accessing a few validation data and model parameters, which are impractical in many real-world applications, e.g., when the model is provided as a cloud service. In this paper, we address the practical task of blind backdoor defense at test time, in particular for black-box models. The true label of every test image needs to be recovered on the fly from a suspicious model regardless of image benignity. We focus on testtime image purification methods that incapacitate possible triggers while keeping semantic contents intact. Due to diverse trigger patterns and sizes, the heuristic trigger search in image space can be unscalable. We circumvent such barrier by leveraging the strong reconstruction power of generative models, and propose a framework of Blind Defense with Masked AutoEncoder (BDMAE). It detects possible triggers in the token space using image structural similarity and label consistency between the test image and MAE restorations. The detection results are then refined by considering trigger topology. Our approach is blind to the model architectures, trigger patterns and image benignity. Code is available at https://github.com/tsun/BDMAE. Deep neural networks have been widely used in various computer vision tasks, like image classification (Krizhevsky et al., 2012), object detection (Girshick et al., 2014) and image segmentation (Long et al., 2015), etc. Despite the superior performances, their vulnerability to backdoor attacks has raised increasing concerns (Gu et al., 2019; Nguyen & Tran, 2020; Turner et al., 2019). During training, an adversary can maliciously inject a small portion of poisoned data. These images contain special triggers that are associated with specific target labels. At inference, the backdoored model behaves normally on clean images but makes incorrect predictions on images with triggers. To defend against backdoor behaviors, existing methods often require accessing a few validation data and model parameters. Some works reverse-engineer triggers (Wang et al., 2019; Guan et al., 2022), and mitigate backdoor by pruning bad neurons or retraining models (Liu et al., 2018; Wang et al., 2019; Zeng et al., 2021). The clean labeled data they require, however, are often unavailable. A recent work shows that the backdoor behaviors could be cleansed with unlabeled or even out-ofdistribution data (Pang et al., 2023). Instead of modifying the model, Februus (Doan et al., 2020) detects triggers with GradCAM (Selvaraju et al., 2017), and feeds purified images to the model.


Backdoor Cleansing with Unlabeled Data

arXiv.org Artificial Intelligence

Due to the increasing computational demand of Deep Neural Networks (DNNs), companies and organizations have begun to outsource the training process. However, the externally trained DNNs can potentially be backdoor attacked. It is crucial to defend against such attacks, i.e., to postprocess a suspicious model so that its backdoor behavior is mitigated while its normal prediction power on clean inputs remain uncompromised. To remove the abnormal backdoor behavior, existing methods mostly rely on additional labeled clean samples. However, such requirement may be unrealistic as the training data are often unavailable to end users. In this paper, we investigate the possibility of circumventing such barrier. We propose a novel defense method that does not require training labels. Through a carefully designed layer-wise weight re-initialization and knowledge distillation, our method can effectively cleanse backdoor behaviors of a suspicious network with negligible compromise in its normal behavior. In experiments, we show that our method, trained without labels, is on-par with state-of-the-art defense methods trained using labels. We also observe promising defense results even on out-of-distribution data. This makes our method very practical. Code is available at: https://github.com/luluppang/BCU.


Cooperative Multi-Agent Policy Gradients with Sub-optimal Demonstration

arXiv.org Artificial Intelligence

Many reality tasks such as robot coordination can be naturally modelled as multi-agent cooperative system where the rewards are sparse. This paper focuses on learning decentralized policies for such tasks using sub-optimal demonstration. To learn the multi-agent cooperation effectively and tackle the sub-optimality of demonstration, a self-improving learning method is proposed: On the one hand, the centralized state-action values are initialized by the demonstration and updated by the learned decentralized policy to improve the sub-optimality. On the other hand, the Nash Equilibrium are found by the current state-action value and are used as a guide to learn the policy. The proposed method is evaluated on the combat RTS games which requires a high level of multi-agent cooperation. Extensive experimental results on various combat scenarios demonstrate that the proposed method can learn multi-agent cooperation effectively. It significantly outperforms many state-of-the-art demonstration based approaches.