Test-time Adversarial Defense with Opposite Adversarial Path and High Attack Time Cost

Yeh, Cheng-Han, Yu, Kuanchun, Lu, Chun-Shien

arXiv.org Artificial Intelligence 

Deep learning models are known to be vulnerable to adversarial attacks by injecting sophisticated designed perturbations to input data. In this paper, we investigate a new test-time adversarial defense method via diffusion-based recovery along opposite adversarial paths (OAPs). We present a purifier that can be plugged into a pre-trained model to resist adversarial attacks. Different from prior arts, the key idea is excessive denoising or purification by integrating the opposite adversarial direction with reverse diffusion to push the input image further toward the opposite adversarial direction. Through the lens of time complexity, we examine the trade-off between the effectiveness of adaptive attack and its computation complexity against our defense. Experimental evaluation along with time cost analysis verifies the effectiveness of the proposed method. It has been well known that deep learning models are vulnerable to adversarial attacks by injecting (imperceptible) adversarial perturbations into the data that will be input to a neural network (NN) model to change its normal predictions Athalye et al. (2018); Carlini et al. (2019); Croce et al. (2023); Frosio & Kautz (2023); Goodfellow et al. (2015); Gowal et al. (2021); Madry et al. (2018); Venkatesh et al. (2023). Please also see Chen & Liu (2023) for a recent review on the adversarial robustness of deep learning models. It can be found from the literature that adversarial attacks defeat their defense counterparts easily and rapidly, and there is still a gap between natural accuracy and robust accuracy.