Yet another but more efficient black-box adversarial attack: tiling and evolution strategies

Meunier, Laurent, Atif, Jamal, Teytaud, Olivier

arXiv.org Artificial Intelligence 

We introduce a new black-box attack achieving state of the art performances. It only requires to have access to the logits of the classifier without any other information which is a more realistic scenario. Not only we introduce a new objective function, we extend previous works on black box adversarial attacks to a larger spectrum of evolution strategies and other derivative-free optimization methods. We also highlight a new intriguing property that deep neural networks are not robust to single shot tiled attacks. Our models achieve, with a budget limited to 10, 000 queries, results up to 99 .2% of success rate against InceptionV3 classifier with 630 queries to the network on average in the untargeted attacks setting, which is an improvement by 90 queries of the current state of the art. In the targeted setting, we are able to reach, with a limited budget of 100, 000, 100% of success rate with a budget of 6, 662 queries on average, i.e. we need 800 queries less than the current state of the art. Despite their success, deep learning algorithms have shown vulnerability to adversarial attacks (Big-gio et al., 2013; Szegedy et al., 2014), i.e. small imperceptible perturbations of the inputs, that lead the networks to misclassify the generated adversarial examples. Since their discovery, adversarial attacks and defenses have become one of the hottest research topics in the machine learning community as serious security issues are raised in many critical fields. They also question our understanding of deep learning behaviors. Designing new and stronger attacks helps building better defenses, hence the motivation of our work. First attacks were generated in a setting where the attacker knows all the information of the network (architecture and parameters).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found