Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense

Neural Information Processing Systems 

Backdoor attacks pose a significant threat to Deep Neural Networks (DNNs) as they allow attackers to manipulate model predictions with backdoor triggers. To address these security vulnerabilities, various backdoor purification methods have been proposed to purify compromised models.