adversarial example


Distribution Density, Tails, and Outliers in Machine Learning: Metrics and Applications

arXiv.org Machine Learning

We develop techniques to quantify the degree to which a given (training or testing) example is an outlier in the underlying distribution. We evaluate five methods to score examples in a dataset by how well-represented the examples are, for different plausible definitions of "well-represented", and apply these to four common datasets: MNIST, Fashion-MNIST, CIFAR-10, and ImageNet. Despite being independent approaches, we find all five are highly correlated, suggesting that the notion of being well-represented can be quantified. Among other uses, we find these methods can be combined to identify (a) prototypical examples (that match human expectations); (b) memorized training examples; and, (c) uncommon submodes of the dataset. Further, we show how we can utilize our metrics to determine an improved ordering for curriculum learning, and impact adversarial robustness. We release all metric values on training and test sets we studied.


Generative Well-intentioned Networks

arXiv.org Machine Learning

We propose Generative Well-intentioned Networks (GWINs), a novel framework for increasing the accuracy of certainty-based, closed-world classifiers. A conditional generative network recovers the distribution of observations that the classifier labels correctly with high certainty. We introduce a reject option to the classifier during inference, allowing the classifier to reject an observation instance rather than predict an uncertain label. These rejected observations are translated by the generative network to high-certainty representations, which are then relabeled by the classifier. This architecture allows for any certainty-based classifier or rejection function and is not limited to multilayer perceptrons. The capability of this framework is assessed using benchmark classification datasets and shows that GWINs significantly improve the accuracy of uncertain observations.


Open the Boxes of Words: Incorporating Sememes into Textual Adversarial Attack

arXiv.org Artificial Intelligence

Adversarial attack is carried out to reveal the vulnerability of deep neural networks. Word substitution is a class of effective adversarial textual attack method, which has been extensively explored. However, all existing studies utilize word embeddings or thesauruses to find substitutes. In this paper, we incorporate sememes, the minimum semantic units, into adversarial attack. We propose an efficient sememe-based word substitution strategy and integrate it into a genetic attack algorithm. In experiments, we employ our attack method to attack LSTM and BERT on both Chinese and English sentiment analysis as well as natural language inference benchmark datasets. Experimental results demonstrate our model achieves better attack success rates and less modification than the baseline methods based on word embedding or synonym. Furthermore, we find our attack model can bring more robustness enhancement to the target model with adversarial training.


EdgeFool: An Adversarial Image Enhancement Filter

arXiv.org Machine Learning

Adversarial examples are intentionally perturbed images that mislead classifiers. These images can, however, be easily detected using denoising algorithms, when high-frequency spatial perturbations are used, or can be noticed by humans, when perturbations are large. In this paper, we propose EdgeFool, an adversarial image enhancement filter that learns structure-aware adversarial perturbations. EdgeFool generates adversarial images with perturbations that enhance image details via training a fully convolutional neural network end-to-end with a multi-task loss function. This loss function accounts for both image detail enhancement and class misleading objectives. We evaluate EdgeFool on three classifiers (ResNet-50, ResNet-18 and AlexNet) using two datasets (ImageNet and Private-Places365) and compare it with six adversarial methods (DeepFool, SparseFool, Carlini-Wagner, SemanticAdv, Non-targeted and Private Fast Gradient Sign Methods).


matter II media: mobile and web technology

#artificialintelligence

Artificial intelligence, in any meaningful sense, doesn't exist. Every example of what is sometimes taken to be AI is in fact a case of the'robotic fallacy'1. This fallacy is to mistake an instance of seemingly intelligent behaviour for the existence of an underlying faculty of intelligence. That is, one sees or hears an AI program or robot say or do something which, if it were human, would be associated with a general level of intelligence. And one tends to assume it also has – or could come to possess – that intelligence. But in fact the behaviour fell into what, by human standards, is a very narrow and customised domain. There is little, if anything, else that the AI can offer in the way of apparently intelligent actions. And it's not a question of waiting a little while until researchers have worked out how to attain AI. There is a vast chasm they need to cross. And that chasm, it will be argued here, exists in part because of a failure to recognise the nature of symbolic systems. To make things more concrete, here is pseudocode for a type of'AI' program2 typified by Alexa, Siri and other virtual assistants or bots: Technology called machine learning turns the human's spoken words into text and performs'natural language processing' to break them down to fit a template of (greeting, name).


Facebook trained AI to fool facial recognition systems, and it works on live video

#artificialintelligence

Facebook remains embroiled in a multibillion-dollar judgement lawsuit over its facial recognition practices, but that hasn't stopped its artificial intelligence research division from developing technology to combat the very misdeeds of which the company is accused. According to VentureBeat, Facebook AI Research (FAIR) has developed a state-of-the-art "de-identification" system that works on video, including even live video. It works by altering key facial features of a video subject in real time using machine learning, to trick a facial recognition system into improperly identifying the subject. This de-identification technology has existed in the past and there are entire companies, like Israeli AI and privacy firm D-ID, dedicated to providing it for still images. There's also a whole category of facial recognition fooling imagery you can wear yourself, called adversarial examples, that work by exploiting weaknesses in how computer vision software has been trained to identify certain characteristics.


Detection of Adversarial Attacks and Characterization of Adversarial Subspace

arXiv.org Machine Learning

Adversarial attacks have always been a serious threat for any data-driven model. In this paper, we explore subspaces of adversarial examples in unitary vector domain, and we propose a novel detector for defending our models trained for environmental sound classification. We measure chordal distance between legitimate and malicious representation of sounds in unitary space of generalized Schur decomposition and show that their manifolds lie far from each other. Our front-end detector is a regularized logistic regression which discriminates eigenvalues of legitimate and adversarial spectrograms. The experimental results on three benchmarking datasets of environmental sounds represented by spectrograms reveal high detection rate of the proposed detector for eight types of adversarial attacks and outperforms other detection approaches.


Understanding and Quantifying Adversarial Examples Existence in Linear Classification

arXiv.org Machine Learning

State-of-art deep neural networks (DNN) are vulnerable to attacks by adversarial examples: a carefully designed small perturbation to the input, that is imperceptible to human, can mislead DNN. To understand the root cause of adversarial examples, we quantify the probability of adversarial example existence for linear classifiers. Previous mathematical definition of adversarial examples only involves the overall perturbation amount, and we propose a more practical relevant definition of strong adversarial examples that separately limits the perturbation along the signal direction also. We show that linear classifiers can be made robust to strong adversarial examples attack in cases where no adversarial robust linear classifiers exist under the previous definition. The quantitative formulas are confirmed by numerical experiments using a linear support vector machine (SVM) classifier. The results suggest that designing general strong-adversarial-robust learning systems is feasible but only through incorporating human knowledge of the underlying classification problem.


Label Smoothing and Logit Squeezing: A Replacement for Adversarial Training?

arXiv.org Machine Learning

A BSTRACT Adversarial training is one of the strongest defenses against adversarial attacks, but it requires adversarial examples to be generated for every mini-batch during optimization. The expense of producing these examples during training often precludes adversarial training from use on complex image datasets. In this study, we explore the mechanisms by which adversarial training improves classifier robustness, and show that these mechanisms can be effectively mimicked using simple regularization methods, including label smoothing and logit squeezing. Remarkably, using these simple regularization methods in combination with Gaussian noise injection, we are able to achieve strong adversarial robustness - often exceeding that of adversarial training - using no adversarial examples. However, the existence of adversarial examples has raised concerns about the security of computer vision systems (Szegedy et al., 2013; Biggio et al., 2013). For example, an attacker may cause a system to mistake a stop sign for another object (Evtimov et al., 2017) or mistake one person for another (Sharif et al., 2016). To address security concerns for high-stakes applications, researchers are searching for ways to make models more robust to attacks. Many defenses have been proposed to combat adversarial examples. Approaches such as feature squeezing, denoising, and encoding (Xu et al., 2017; Samangouei et al., 2018; Shen et al., 2017; Meng & Chen, 2017) have had some success at pre-processing images to remove adversarial perturbations. Other approaches focus on hardening neural classifiers to reduce adversarial susceptibility. This includes specialized non-linearities (Zantedeschi et al., 2017), modified training processes Pa-pernot et al. (2016), and gradient obfuscation Athalye et al. (2018).


Securing machine learning models against adversarial attacks

#artificialintelligence

Beware: many defence methods can lead to gradient masking, whether intentional or not. Gradient masking does not guarantee adversarial robustness, and has been shown to be circumventable (Tramèr et al., 2017; Athalye et al, 2018). We hope this article provides helpful insights on how to defend against adversarial examples. Please feel free to provide suggestions in the comment section if we're missing something.