Low-Rank Adversarial PGD Attack

Savostianova, Dayana, Zangrando, Emanuele, Tudisco, Francesco

arXiv.org Machine Learning 

Adversarial attacks, characterized by subtle data perturbations that destabilize neural network predictions, have been a topic of significant interest for over a decade [48, 16, 32, 5]. These attacks have evolved into various forms, depending on the knowledge of the model's architecture (white-box, gray-box, black-box) [49], the type of data being targeted (graphs, images, text, etc.) [12, 47, 16, 57], and the specific adversarial objectives (targeted, untargeted, defense-oriented) [55, 29]. While numerous defense strategies aim to broadly stabilize models against adversarial attacks, independent of the specific attack mechanism [7, 14, 15, 41], the most effective and widely-used defenses focus on adversarial training, where the model is trained to withstand particular attacks [29, 50]. Adversarial training is known for producing robust models efficiently, but its effectiveness hinges on the availability of adversarial attacks that are both potent in degrading model accuracy and efficient in terms of computational resources. However, the most aggressive attacks often require significant computational resources, making them less practical for adversarial training. The projected gradient descent (PGD) attack [29] is popular in adversarial training due to its balance between aggressiveness and computational efficiency. In this work, we observe that in many cases the perturbations generated by PGD predominantly affect the lower part of the singular value spectrum of input images, indicating that these perturbations are approximately low-rank. Additionally, we find that the size of PGD-generated attacks differs significantly between standard and adversarially trained models when measured by their nuclear norm, which sums the singular values of the attack. This metric provides insight into the frequency profile of the attack when analyzed using the singular value decomposition (SVD) transform, aligning with known frequency profiles observed under discrete Fourier transform (DFT) and discrete cosine transform (DCT) analyses of PGD attacks [54, 31].