MALT Powers Up Adversarial Attacks

Melamed, Odelia, Yehudai, Gilad, Shamir, Adi

arXiv.org Machine Learning 

Neural networks are widely known to be susceptible to adversarial perturbations (Szegedy et al. [2013]), which are typically imperceptible by humans. Many different papers have shown how to construct such attacks, where adding a small perturbation to the input significantly changes the output of the model (Carlini and Wagner [2017], Papernot et al. [2017], Athalye et al. [2018]). To protect from these attacks, researchers have tried to develop more robust models using several different techniques, such as adversarial training using different attacks (Madry et al. [2017], Papernot et al. [2016], Liu et al. [2023], Wang et al. [2023]). The current state of the art adversarial attack, known as AutoAttack (Croce and Hein [2020b]), combines several different parameter-free attacks, some targeted and some untargeted. AutoAttack currently leads the RobustBench benchmark (Croce et al. [2020]), which is the standard benchmark for adversarial robustness. Notably, the targeted attacks used in AutoAttack pick the adversarial target classes according to the model's confidence levels and attack the top nine classes, even though CIFAR-100 and ImageNet have many more possible target classes. The reason for attacking only a limited number of classes, rather than all possible classes, is computational, as each such attack has a significant running time. The reason that adversarial examples exist remains a hot topic of debate, specifically, whether it is due to the highly non-linear landscape of neural networks or rather to their local linearity properties.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found