Machine learning models, especially based on deep architectures are used in everyday applications ranging from self driving cars to medical diagnostics. It has been shown that such models are dangerously susceptible to adversarial samples, indistinguishable from real samples to human eye, adversarial samples lead to incorrect classifications with high confidence. Impact of adversarial samples is far-reaching and their efficient detection remains an open problem. We propose to use direct density ratio estimation as an efficient model agnostic measure to detect adversarial samples. Our proposed method works equally well with single and multi-channel samples, and with different adversarial sample generation methods. We also propose a method to use density ratio estimates for generating adversarial samples with an added constraint of preserving density ratio.
Generative Adversarial Networks have become one of the most studied frameworks for unsupervised learning due to their intuitive formulation. They have also been shown to be capable of generating convincing examples in limited domains, such as low-resolution images. However, they still prove difficult to train in practice and tend to ignore modes of the data generating distribution. Quantitatively capturing effects such as mode coverage and more generally the quality of the generative model still remain elusive. We propose Generative Adversarial Parallelization, a framework in which many GANs or their variants are trained simultaneously, exchanging their discriminators. This eliminates the tight coupling between a generator and discriminator, leading to improved convergence and improved coverage of modes. We also propose an improved variant of the recently proposed Generative Adversarial Metric and show how it can score individual GANs or their collections under the GAP model.
In this paper we propose a novel method for detecting adversarial examples by training a binary classifier with both origin data and saliency data. In the case of image classification model, saliency simply explain how the model make decisions by identifying significant pixels for prediction. A model shows wrong classification output always learns wrong features and shows wrong saliency as well. Our approach shows good performance on detecting adversarial perturbations. We quantitatively evaluate generalization ability of the detector, showing that detectors trained with strong adversaries perform well on weak adversaries.
Although it is recently introduced, in last few years, generative adversarial network (GAN) has been shown many promising results to generate realistic samples. However, it is hardly able to control generated samples since input variables for a generator are from a random distribution. Some attempts have been made to control generated samples from GAN, but they have not shown good performances with difficult problems. Furthermore, it is hardly possible to control the generator to concentrate on reality or distinctness. For example, with existing models, a generator for face image generation cannot be set to concentrate on one of the two objectives, i.e. generating realistic face and generating difference face according to input labels. Here, we propose controllable GAN (CGAN) in this paper. CGAN shows powerful performance to control generated samples; in addition, it can control the generator to concentrate on reality or distinctness. In this paper, CGAN is evaluated with CelebA datasets. We believe that CGAN can contribute to the research in generative neural network models.
Deep learning on graph structures has shown exciting results in various applications. However, few attentions have been paid to the robustness of such models, in contrast to numerous research work for image or text adversarial attack and defense. In this paper, we focus on the adversarial attacks that fool the model by modifying the combinatorial structure of data. We first propose a reinforcement learning based attack method that learns the generalizable attack policy, while only requiring prediction labels from the target classifier. Also, variants of genetic algorithms and gradient methods are presented in the scenario where prediction confidence or gradients are available. We use both synthetic and real-world data to show that, a family of Graph Neural Network models are vulnerable to these attacks, in both graph-level and node-level classification tasks. We also show such attacks can be used to diagnose the learned classifiers.