clean data
Retaining Beneficial Information from Detrimental Data for Deep Neural Network Repair
The performance of deep learning models heavily relies on the quality of the training data. Inadequacies in the training data, such as corrupt input or noisy labels, can lead to the failure of model generalization. Recent studies propose repairing the model by identifying the training samples that contribute to the failure and removing their influence from the model. However, it is important to note that the identified data may contain both beneficial and detrimental information. Simply erasing the information of the identified data from the model can have a negative impact on its performance, especially when accurate data is mistakenly identified as detrimental and removed.
AGradient Method for Multilevel Optimization Ryo Sato The University of Tokyo Mirai Tanaka The Institute of Statistical Mathematics RIKEN Akiko Takeda The University of Tokyo RIKEN
Although application examples of multilevel optimization have already been discussed since the 1990s, the development of solution methods was almost limited to bilevel cases due to the difficulty of the problem. In recent years, in machine learning, Franceschi et al. have proposed a method for solving bilevel optimization problems by replacing their lower-level problems with the T steepest descent update equations with some prechosen iteration number T. In this paper, we have developed a gradient-based algorithm for multilevel optimization with n levels based on their idea and proved that our reformulation asymptotically converges to the original multilevel problem. As far as we know, this is one of the first algorithms with some theoretical guarantee for multilevel optimization. Numerical experiments show that a trilevel hyperparameter learning model considering data poisoning produces more stable prediction results than an existing bilevel hyperparameter learning model in noisy data settings.
Unveiling and Mitigating Backdoor Vulnerabilities based on Unlearning Weight Changes and Backdoor Activeness
The security threat of backdoor attacks is a central concern for deep neural networks (DNNs). Recently, without poisoned data, unlearning models with clean data and then learning a pruning mask have contributed to backdoor defense. Additionally, vanilla fine-tuning with those clean data can help recover the lost clean accuracy. However, the behavior of clean unlearning is still under-explored, and vanilla fine-tuning unintentionally induces back the backdoor effect. In this work, we first investigate model unlearning from the perspective of weight changes and gradient norms, and find two interesting observations in the backdoored model: 1) the weight changes between poison and clean unlearning are positively correlated, making it possible for us to identify the backdoored-related neurons without using poisoned data; 2) the neurons of the backdoored model are more active (, larger gradient norm) than those in the clean model, suggesting the need to suppress the gradient norm during fine-tuning. Then, we propose an effective two-stage defense method. In the first stage, an efficient is proposed based on observation 1). In the second stage, based on observation 2), we design an to replace the vanilla fine-tuning. Extensive experiments, involving eight backdoor attacks on three benchmark datasets, demonstrate the superior performance of our proposed method compared to recent state-of-the-art backdoor defense approaches.
8cbe9ce23f42628c98f80fa0fac8b19a-Supplemental.pdf
After training for 200 epochs, we achieve the attack success rate (ASR) of99.97% and the natural accuracy on clean data (ACC)of93.73%. Blend attack [6]: We first generate a trigger pattern where each pixel value is sampled from auniform distribution in[0,255]asshowninFigure 6(c). Input-aware Attack (IAB) [30]: The dynamic trigger varies across samples as shown in Figure 6(d). We apply two types of target label selection. Clean-labelAttack(CLB)[42]: The trigger is a3 3checkerboard at the four corners of images as shown in Figure 7(b).