Goto

Collaborating Authors

 asr


8cbe9ce23f42628c98f80fa0fac8b19a-Supplemental.pdf

Neural Information Processing Systems

After training for 200 epochs, we achieve the attack success rate (ASR) of99.97% and the natural accuracy on clean data (ACC)of93.73%. Blend attack [6]: We first generate a trigger pattern where each pixel value is sampled from auniform distribution in[0,255]asshowninFigure 6(c). Input-aware Attack (IAB) [30]: The dynamic trigger varies across samples as shown in Figure 6(d). We apply two types of target label selection. Clean-labelAttack(CLB)[42]: The trigger is a3 3checkerboard at the four corners of images as shown in Figure 7(b).




Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack

Neural Information Processing Systems

To further verify this finding, we empirically show that these dormant backdoors can be easily re-activated during inference stage, by manipulating the original trigger with well-designed tiny perturbation using universal adversarial attack.







Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense

Neural Information Processing Systems

However, Does achieving a low ASR through current safety purification methods truly eliminate learned backdoor features from the pretraining phase? In this paper, we provide an affirmative answer to this question by thoroughly investigating the Post-Purification Robustness of current backdoor purification methods.