Goto

Collaborating Authors

 c-acc



Original BadNet Blended WaNet SIG SSBA LC

Neural Information Processing Systems

Deep Neural Networks (DNNs) are extensively applied in today's society especially for some safety-critical scenarios like autonomous driving and face verification. B.2 Attack Configurations We conducted all the experiments with 4 NVIDIA 3090 GPUs. LC, we adopt the pre-generated invisible trigger from BackdoorBench. The visualization of the backdoored images is shown in Figure 9 . FE-tuning and FT -init proposed in our paper.



Supplementary Material of " BackdoorBench: A Comprehensive Benchmark of Backdoor Learning " Baoyuan Wu1 Hongrui Chen

Neural Information Processing Systems

A.1 Descriptions of backdoor attack algorithms In addition to the basic information in Table 1 of the main manuscript, here we describe the general idea of eight implemented backdoor attack algorithms in BackdoorBench, as follows. A.2 Descriptions of backdoor defense algorithms In addition to the basic information in Table 2 of the main manuscript, here we describe the general idea of nine implemented backdoor defense algorithms in BackdoorBench, as follows. It is used to determine the number of pruned neurons. Running environments Our evaluations are conducted on GPU servers with 2 Intel(R) Xeon(R) Platinum 8170 CPU @ 2.10GHz, RTX3090 GPU (32GB) and 320 GB RAM (2666MHz). With these hyper-3 Table 2: Hyper-parameter settings of all implemented defense methods.






Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense

Min, Rui, Qin, Zeyu, Zhang, Nevin L., Shen, Li, Cheng, Minhao

arXiv.org Artificial Intelligence

Backdoor attacks pose a significant threat to Deep Neural Networks (DNNs) as they allow attackers to manipulate model predictions with backdoor triggers. To address these security vulnerabilities, various backdoor purification methods have been proposed to purify compromised models. Typically, these purified models exhibit low Attack Success Rates (ASR), rendering them resistant to backdoored inputs. However, Does achieving a low ASR through current safety purification methods truly eliminate learned backdoor features from the pretraining phase? In this paper, we provide an affirmative answer to this question by thoroughly investigating the Post-Purification Robustness of current backdoor purification methods. We find that current safety purification methods are vulnerable to the rapid re-learning of backdoor behavior, even when further fine-tuning of purified models is performed using a very small number of poisoned samples. Based on this, we further propose the practical Query-based Reactivation Attack (QRA) which could effectively reactivate the backdoor by merely querying purified models. We find the failure to achieve satisfactory post-purification robustness stems from the insufficient deviation of purified models from the backdoored model along the backdoor-connected path. To improve the post-purification robustness, we propose a straightforward tuning defense, Path-Aware Minimization (PAM), which promotes deviation along backdoor-connected paths with extra model updates. Extensive experiments demonstrate that PAM significantly improves post-purification robustness while maintaining a good clean accuracy and low ASR. Our work provides a new perspective on understanding the effectiveness of backdoor safety tuning and highlights the importance of faithfully assessing the model's safety.


Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning

Min, Nay Myat, Pham, Long H., Sun, Jun

arXiv.org Artificial Intelligence

The application of deep neural network models in various security-critical applications has raised significant security concerns, particularly the risk of backdoor attacks. Neural backdoors pose a serious security threat as they allow attackers to maliciously alter model behavior. While many defenses have been explored, existing approaches are often bounded by model-specific constraints, or necessitate complex alterations to the training process, or fall short against diverse backdoor attacks. In this work, we introduce a novel method for comprehensive and effective elimination of backdoors, called ULRL (short for UnLearn and ReLearn for backdoor removal). ULRL requires only a small set of clean samples and works effectively against all kinds of backdoors. It first applies unlearning for identifying suspicious neurons and then targeted neural weight tuning for backdoor mitigation (i.e., by promoting significant weight deviation on the suspicious neurons). Evaluated against 12 different types of backdoors, ULRL is shown to significantly outperform state-of-the-art methods in eliminating backdoors whilst preserving the model utility.