Random Spiking and Systematic Evaluation of Defenses Against Adversarial Examples
Ge, Huangyi, Chau, Sze Yiu, Li, Ninghui
–arXiv.org Artificial Intelligence
Abstract--Image classifiers often suffer from adversarial examples, whichare generated by adding a small amount of noises to input images to trick classifiers into misclassification. Over the years, many defense mechanisms have been proposed, and different researchers have made seemingly contradictory claims on their effectiveness. We argue that such discrepancies are primarily dueto inconsistent assumptions on the attacker's knowledge. To this end, we present an analysis of possible adversarial models, and propose an evaluation framework for comparing different defense mechanisms. As part of the framework, we introduced a more powerful and realistic adversary strategy. We propose a new defense mechanism called Random Spiking (RS), which generalizes dropout and introduces random noises in the training process in a controlled manner. With a carefully chosen placement, RS incurs negligible negative impact on prediction accuracy. Evaluations under our proposed framework suggest RS delivers better protection against adversarial examples than many existing schemes. I. INTRODUCTION Modern society is increasingly reliant upon software systems trainedby machine learning techniques. Many such techniques, however,were designed under the implicit assumption that both the training and test data follow the same static (although possibly unknown) distribution. In the presence of intelligent and resourceful adversaries, this assumption may no longer hold. A malicious adversary can deliberately manipulate aninput instance to make it deviate from the distribution of the training/testing dataset, and cause the learning algorithms and the trained models to behave unexpectedly. For example, it is found that existing image classifiers based on Deep Neural Networks are highly vulnerable to adversarial examples [1], [2]. Often times, by modifying an image in a way that is barely noticeable by humans, the classifier will confidently classify it as something else. This phenomenon also exists for classifiers that do not use neural networks, and has been called "optical illusions for machines". Many approaches have since been proposed to help defend against adversarial examples. For example, Goodfellow et al. [1] proposed adversarial training, in which one trains a neural network using both the original training dataset and the newly generated adversarial examples. When given an input instance, one generates multiple instances by adding small amount of randomly generated noises to the original instance, collects the predictions on all perturbed instances, and uses majority voting to make the final prediction. Some approaches attempt to train additional neural network models to identify and reject adversarial examples [4], [5].
arXiv.org Artificial Intelligence
Dec-4-2018
- Country:
- North America > United States > Indiana > Tippecanoe County (0.14)
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: