Collaborating Authors


AAAI Conferences

Despite the widespread use of machine learning in adversarial settings such as computer security, recent studies have demonstrated vulnerabilities to evasion attacks---carefully crafted adversarial samples that closely resemble legitimate instances, but cause misclassification. In this paper, we examine the adequacy of the leading approach to generating adversarial samples---the gradient-descent approach. In particular (1) we perform extensive experiments on three datasets, MNIST, USPS and Spambase, in order to analyse the effectiveness of the gradient-descent method against non-linear support vector machines, and conclude that carefully reduced kernel smoothness can significantly increase robustness to the attack; (2) we demonstrate that separated inter-class support vectors lead to more secure models, and propose a quantity similar to margin that can efficiently predict potential susceptibility to gradient-descent attacks, before the attack is launched; and (3) we design a new adversarial sample construction algorithm based on optimising the multiplicative ratio of class decision functions.

Enforcing Linearity in DNN succours Robustness and Adversarial Image Generation Machine Learning

Recent studies on the adversarial vulnerability of neural networks have shown that models trained with the objective of minimizing an upper bound on the worst-case loss over all possible adversarial perturbations improve robustness against adversarial attacks. Beside exploiting adversarial training framework, we show that by enforcing a Deep Neural Network (DNN) to be linear in transformed input and feature space improves robustness significantly. We also demonstrate that by augmenting the objective function with Local Lipschitz regularizer boost robustness of the model further. Our method outperforms most sophisticated adversarial training methods and achieves state of the art adversarial accuracy on MNIST, CIFAR10 and SVHN dataset. In this paper, we also propose a novel adversarial image generation method by leveraging Inverse Representation Learning and Linearity aspect of an adversarially trained deep neural network classifier.

Adversarial Detection with Model Interpretation


Machine learning (ML) systems have been increasingly applied in web security applications such as spammer detection, malware detection and fraud detection. These applications have an intrinsic adversarial nature where intelligent attackers can adaptively change their behaviors to avoid being detected by the deployed detectors. Existing efforts against adversaries are usually limited by the type of applied ML models or the specific applications such as image classification. Additionally, the working mechanisms of ML models usually cannot be well understood by users, which in turn impede them from understanding the vulnerabilities of models nor improving their robustness. To bridge the gap, in this paper, we propose to investigate whether model interpretation could potentially help adversarial detection.

Defending Against Adversarial Attacks by Leveraging an Entire GAN Machine Learning

Recent work has shown that state-of-the-art models are highly vulnerable to adversarial perturbations of the input. We propose cowboy, an approach to detecting and defending against adversarial attacks by using both the discriminator and generator of a GAN trained on the same dataset. We show that the discriminator consistently scores the adversarial samples lower than the real samples across multiple attacks and datasets. We provide empirical evidence that adversarial samples lie outside of the data manifold learned by the GAN. Based on this, we propose a cleaning method which uses both the discriminator and generator of the GAN to project the samples back onto the data manifold. This cleaning procedure is independent of the classifier and type of attack and thus can be deployed in existing systems.

Determining Sequence of Image Processing Technique (IPT) to Detect Adversarial Attacks Artificial Intelligence

Developing secure machine learning models from adversarial examples is challenging as various methods are continually being developed to generate adversarial attacks. In this work, we propose an evolutionary approach to automatically determine Image Processing Techniques Sequence (IPTS) for detecting malicious inputs. Accordingly, we first used a diverse set of attack methods including adaptive attack methods (on our defense) to generate adversarial samples from the clean dataset. A detection framework based on a genetic algorithm (GA) is developed to find the optimal IPTS, where the optimality is estimated by different fitness measures such as Euclidean distance, entropy loss, average histogram, local binary pattern and loss functions. The "image difference" between the original and processed images is used to extract the features, which are then fed to a classification scheme in order to determine whether the input sample is adversarial or clean. This paper described our methodology and performed experiments using multiple data-sets tested with several adversarial attacks. For each attack-type and dataset, it generates unique IPTS. A set of IPTS selected dynamically in testing time which works as a filter for the adversarial attack. Our empirical experiments exhibited promising results indicating the approach can efficiently be used as processing for any AI model.