Greedy Attack and Gumbel Attack: Generating Adversarial Examples for Discrete Data

Yang, Puyudi, Chen, Jianbo, Hsieh, Cho-Jui, Wang, Jane-Ling, Jordan, Michael I.

arXiv.org Machine Learning 

Robustness to adversarial perturbation has become an extremely important criterion for applications of machine learning in security-sensitive domains such as spam detection [25], fraud detection [6], criminal justice [3], malware detection [13], and financial markets [27]. Systematic methods for generating adversarial examples by small perturbations of original input data, also known as "attack," have been developed to operationalize this criterion and to drive the development of more robust learning systems [4, 26, 7]. Most of the work in this area has focused on differentiable models with continuous input spaces [26, 7, 14, 14]. In this setting, the proposed attack strategies add a gradient-based perturbation to the original input. It has been shown that such perturbations can result in a dramatic decrease in the predictive accuracy of the model. Thus this line of research has demonstrated the vulnerability of deep neural networks to adversarial examples in tasks like image classification and speech recognition. We focus instead on adversarial attacks on models with discrete input data, such as text data, where each feature of an input sample has a categorical domain. While gradient-based approaches are not directly applicable to this setting, variations of gradient-based approaches have been shown effective in differentiable models. For example, Li et al. [15] proposed to locate the top features with the largest gradient magnitude of their embedding, and Papernot et al. [20] proposed to modify randomly selected features of an input through perturbing each feature by signs of the gradient, and project them onto the closest vector in the embedding space.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found