Goto

Collaborating Authors

perturbation


Even robots can be fooled, but they're getting smarter

#artificialintelligence

Humans tend to think AI can make no mistakes. Isn't it programmed to be perfect, something we, as biological organisms, can never be? Not exactly (and remember that those biological organisms created robots in the first place). If something -- a perturbation -- gets in the way of the thinking of an artificial brain, it can be deceived. This doesn't sound like much of a big deal until you realize that just one glitch could mean disaster, depending what the robot is supposed to be in charge of.


Seeking a way of preventing audio models for AI machine learning from being fooled

#artificialintelligence

Warnings have emerged about the unreliability of the metrics used to detect whether an audio perturbation designed to fool AI models can be perceived by humans. Researchers show that the distortion metrics used to detect intentional perturbations in audio signals are not a reliable measure of human perception, and have proposed a series of improvements. These perturbations, designed to be imperceptible, can be used to cause erroneous predictions in artificial intelligence. Distortion metrics are applied to assess how effective the methods are in generating such attacks.


Global Big Data Conference

#artificialintelligence

Artificial intelligence (AI) is increasingly based on machine learning models, trained using large datasets. Likewise, human-computer interaction is increasingly dependent on speech communication, mainly due to the remarkable performance of machine learning models in speech recognition tasks. However, these models can be fooled by "adversarial" examples; in other words, inputs intentionally perturbed to produce a wrong prediction without the changes being noticed by humans. "Suppose we have a model that classifies audio (e.g., voice command recognition) and we want to deceive it; in other words, generate a perturbation that maliciously prevents the model from working properly. If a signal is heard properly, a person is able to notice whether a signal says'yes,' for example. When we add an adversarial perturbation we will still hear'yes,' but the model will start to hear'no,' or'turn right' instead of left or any other command we don't want to execute," explained Jon Vadillo, researcher in the UPV/EHU's Departament of Computer Science and Artificial Intelligence.


Machine Learning Safety: Unsolved Problems - KDnuggets

#artificialintelligence

Along with researchers from Google Brain and OpenAI, we are releasing a paper on Unsolved Problems in ML Safety. Due to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address. As a preview of the paper, in this post, we consider a subset of the paper's directions, namely withstanding hazards ("Robustness"), identifying hazards ("Monitoring"), and steering ML systems ("Alignment"). Robustness research aims to build systems that are less vulnerable to extreme hazards and adversarial threats. Two problems in robustness are robustness to long tails and robustness to adversarial examples.


AEVA: Black-box Backdoor Detection Using Adversarial Extreme Value Analysis

arXiv.org Artificial Intelligence

Deep neural networks (DNNs) are proved to be vulnerable against backdoor attacks. A backdoor is often embedded in the target DNNs through injecting a backdoor trigger into training examples, which can cause the target DNNs misclassify an input attached with the backdoor trigger. Existing backdoor detection methods often require the access to the original poisoned training data, the parameters of the target DNNs, or the predictive confidence for each given input, which are impractical in many real-world applications, e.g., on-device deployed DNNs. We address the black-box hard-label backdoor detection problem where the DNN is fully black-box and only its final output label is accessible. We approach this problem from the optimization perspective and show that the objective of backdoor detection is bounded by an adversarial objective. Further theoretical and empirical studies reveal that this adversarial objective leads to a solution with highly skewed distribution; a singularity is often observed in the adversarial map of a backdoorinfected example, which we call the adversarial singularity phenomenon. Based on this observation, we propose the adversarial extreme value analysis (AEVA) to detect backdoors in black-box neural networks. AEVA is based on an extreme value analysis of the adversarial map, computed from the monte-carlo gradient estimation. Evidenced by extensive experiments across multiple popular tasks and backdoor attacks, our approach is shown effective in detecting backdoor attacks under the black-box hard-label scenarios. Deep Neural Networks (DNNs) have pervasively been used in a wide range of applications such as facial recognition (Masi et al., 2018), object detection (Szegedy et al., 2013), autonomous driving (Okuyama et al., 2018), and home assistants (Singh et al., 2020). In the meanwhile, DNNs become increasingly complex. Training state-of-the-art models requires enormous data and expensive computation.


From Intrinsic to Counterfactual: On the Explainability of Contextualized Recommender Systems

arXiv.org Artificial Intelligence

With the prevalence of deep learning based embedding approaches, recommender systems have become a proven and indispensable tool in various information filtering applications. However, many of them remain difficult to diagnose what aspects of the deep models' input drive the final ranking decision, thus, they cannot often be understood by human stakeholders. In this paper, we investigate the dilemma between recommendation and explainability, and show that by utilizing the contextual features (e.g., item reviews from users), we can design a series of explainable recommender systems without sacrificing their performance. In particular, we propose three types of explainable recommendation strategies with gradual change of model transparency: whitebox, graybox, and blackbox. Each strategy explains its ranking decisions via different mechanisms: attention weights, adversarial perturbations, and counterfactual perturbations. We apply these explainable models on five real-world data sets under the contextualized setting where users and items have explicit interactions. The empirical results show that our model achieves highly competitive ranking performance, and generates accurate and effective explanations in terms of numerous quantitative metrics and qualitative visualizations.


Unsolved ML safety problems

AIHub

Along with researchers from Google Brain and OpenAI, we are releasing a paper on Unsolved Problems in ML Safety. Due to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address. As a preview of the paper, in this post we consider a subset of the paper's directions, namely withstanding hazards ("Robustness"), identifying hazards ("Monitoring"), and steering ML systems ("Alignment"). Robustness research aims to build systems that are less vulnerable to extreme hazards and to adversarial threats. Two problems in robustness are robustness to long tails and robustness to adversarial examples.


Defensive Tensorization

arXiv.org Artificial Intelligence

We propose defensive tensorization, an adversarial defence technique that leverages a latent high-order factorization of the network. The layers of a network are first expressed as factorized tensor layers. Tensor dropout is then applied in the latent subspace, therefore resulting in dense reconstructed weights, without the sparsity or perturbations typically induced by the randomization.Our approach can be readily integrated with any arbitrary neural architecture and combined with techniques like adversarial training. We empirically demonstrate the effectiveness of our approach on standard image classification benchmarks. We validate the versatility of our approach across domains and low-precision architectures by considering an audio classification task and binary networks. In all cases, we demonstrate improved performance compared to prior works.


Adversarial Robustness in Multi-Task Learning: Promises and Illusions

arXiv.org Artificial Intelligence

Vulnerability to adversarial attacks is a well-known weakness of Deep Neural networks. While most of the studies focus on single-task neural networks with computer vision datasets, very little research has considered complex multi-task models that are common in real applications. In this paper, we evaluate the design choices that impact the robustness of multi-task deep learning networks. We provide evidence that blindly adding auxiliary tasks, or weighing the tasks provides a false sense of robustness. Thereby, we tone down the claim made by previous research and study the different factors which may affect robustness. In particular, we show that the choice of the task to incorporate in the loss function are important factors that can be leveraged to yield more robust models.


Ensemble Federated Adversarial Training with Non-IID data

arXiv.org Artificial Intelligence

Despite federated learning endows distributed clients with a cooperative training mode under the premise of protecting data privacy and security, the clients are still vulnerable when encountering adversarial samples due to the lack of robustness. The adversarial samples can confuse and cheat the client models to achieve malicious purposes via injecting elaborate noise into normal input. In this paper, we introduce a novel Ensemble Federated Adversarial Training Method, termed as EFAT, that enables an efficacious and robust coupled training mechanism. Our core idea is to enhance the diversity of adversarial examples through expanding training data with different disturbances generated from other participated clients, which helps adversarial training perform well in Non-IID settings. Experimental results on different Non-IID situations, including feature distribution skew and label distribution skew, show that our proposed method achieves promising results compared with solely combining federated learning with adversarial approaches.