Goto

Collaborating Authors

 Country


Revealing Perceptible Backdoors, without the Training Set, via the Maximum Achievable Misclassification Fraction Statistic

arXiv.org Machine Learning

Recently, a special type of data poisoning (DP) attack, known as a backdoor, was proposed. These attacks aim to have a classifier learn to classify to a target class whenever the backdoor pattern is present in a test sample. In this paper, we address post-training detection of perceptible backdoor patterns in DNN image classifiers, wherein the defender does not have access to the poisoned training set, but only to the trained classifier itself, as well as to clean (unpoisoned) examples from the classification domain. This problem is challenging since a perceptible backdoor pattern could be any seemingly innocuous object in a scene, and, without the poisoned training set, we have no hint about the actual backdoor pattern used during training. We identify two important properties of perceptible backdoor patterns, based upon which we propose a novel detector using the maximum achievable misclassification fraction (MAMF) statistic. We detect whether the trained DNN has been backdoor-attacked and infer the source and target classes used for devising the attack. Our detector, with an easily chosen threshold, is evaluated on five datasets, five DNN structures and nine backdoor patterns, and shows strong detection capability. Coupled with an imperceptible backdoor detector, our approach helps achieve detection for all evasive backdoors of interest. I NTRODUCTION Deep neural network (DNN) classifiers have achieved state-of-the-art pattern recognition performance in many research areas such as speech recognition [6], bioinformatics [22], and computer vision [12][13]. However, they have also been shown to be vulnerable to adversarial attacks [23]. This has inspired adversarial learning research, wrestling between attackers and defenders.


Eigenvalue Normalized Recurrent Neural Networks for Short Term Memory

arXiv.org Machine Learning

The underlying dynamical system carries temporal information from one time step to another and captures potential dependencies among the terms of a sequence. Like other deep neural networks, the weights of an RNN are learned by gradient descent. For the input at a time step to affect the output at a later time step, the gradients must back-propagate through each step. Since a sequence can be quite long, RNNs are prone to suffer from vanishing or exploding gradients as described in (Bengio, Frasconi, and Simard 1993) and (Pas-canu, Mikolov, and Bengio 2013). One consequence of this well-known problem is the difficulty of the network to model input-output dependency over a large number of time steps. There have been many different architectures that are designed to mitigate this problem. The most popular RNN architectures such as LSTMs (Hochreiter and Schmidhu-ber 1997) and GRUs (Cho et al. 2014), incorporate a gating mechanism to explicitly retain or discard information.


Can You Really Backdoor Federated Learning?

arXiv.org Machine Learning

The decentralized nature of federated learning makes detecting and defending against adversarial attacks a challenging task. This paper focuses on backdoor attacks in the federated learning setting, where the goal of the adversary is to reduce the performance of the model on targeted tasks while maintaining good performance on the main task. Unlike existing works, we allow non-malicious clients to have correctly labeled samples from the targeted tasks. We conduct a comprehensive study of backdoor attacks and defenses for the EMNIST dataset, a real-life, user-partitioned, and non-iid dataset. We observe that in the absence of defenses, the performance of the attack largely depends on the fraction of adversaries present and the "complexity'' of the targeted task. Moreover, we show that norm clipping and "weak'' differential privacy mitigate the attacks without hurting the overall performance. We have implemented the attacks and defenses in TensorFlow Federated (TFF), a TensorFlow framework for federated learning. In open-sourcing our code, our goal is to encourage researchers to contribute new attacks and defenses and evaluate them on standard federated datasets.


Alternating Between Spectral and Spatial Estimation for Speech Separation and Enhancement

arXiv.org Machine Learning

This work investigates alternation between spectral separation using masking-based networks and spatial separation using multichannel beamforming. In this framework, the spectral separation is performed using a mask-based deep network. The result of mask-based separation is used, in turn, to estimate a spatial beamformer. The output of the beamformer is fed back into another mask-based separation network. We explore multiple ways of computing time-varying covariance matrices to improve beamforming, including factorizing the spatial covariance into a time-varying amplitude component and time-invariant spatial component. For the subsequent mask-based filtering, we consider different modes, including masking the noisy input, masking the beamformer output, and a hybrid approach combining both. Our best method first uses spectral separation, then spatial beamforming, and finally a spectral post-filter, and demonstrates an average improvement of 2.8 dB over baseline mask-based separation, across four different reverberant speech enhancement and separation tasks.


Privacy Leakage Avoidance with Switching Ensembles

arXiv.org Machine Learning

W e consider membership inference attacks, one of the main privacy issues in machine learning. These recently developed attacks have been proven successful in determining, with confidence better than a random guess, whether a given sample belongs to the dataset on which the attacked machine learning model was trained. Several approaches have been developed to mitigate this privacy leakage but the tradeoff performance implications of these defensive mechanisms (i.e., accuracy and utility of the defended machine learning model) are not well studied yet. W e propose a novel approach of privacy leakage avoidance with switching ensembles (P ASE), which both protects against current membership inference attacks and does that with very small accuracy penalty, while requiring acceptable increase in training and inference time. W e test our P ASE method, along with the the current state-of-the-art P ATE approach, on three calibration image datasets and analyze their tradeoffs.


Attribute noise robust binary classification

arXiv.org Machine Learning

We consider the problem of learning linear classifiers when both features and labels are binary. In addition, the features are noisy, i.e., they could be flipped with an unknown probability. In Sy-De attribute noise model, where all features could be noisy together with same probability, we show that $0$-$1$ loss ($l_{0-1}$) need not be robust but a popular surrogate, squared loss ($l_{sq}$) is. In Asy-In attribute noise model, we prove that $l_{0-1}$ is robust for any distribution over 2 dimensional feature space. However, due to computational intractability of $l_{0-1}$, we resort to $l_{sq}$ and observe that it need not be Asy-In noise robust. Our empirical results support Sy-De robustness of squared loss for low to moderate noise rates.


RWNE: A Scalable Random-Walk based Network Embedding Framework with Personalized Higher-order Proximity Preserved

arXiv.org Machine Learning

Higher-order proximity preserved network embedding has attracted increasing attention recently. In particular, due to the superior scalability, random-walk based network embedding has also been well developed, which could efficiently explore higher-order neighborhood via multi-hop random walks. However, despite the success of current random-walk based methods, most of them are usually not expressive enough to preserve the personalized higher-order proximity and lack a straightforward objective to theoretically articulate what and how network proximity is preserved. In this paper, to address the above issues, we present a general scalable random-walk based network embedding framework, in which random walk is explicitly incorporated into a sound objective designed theoretically to preserve arbitrary higher-order proximity. Further, we introduce the random walk with restart process into the framework to naturally and effectively achieve personalized-weighted preservation of proximities of different orders. We conduct extensive experiments on several real-world networks and demonstrate that our proposed method consistently and substantially outperforms the state-of-the-art network embedding methods.


Co-Attentive Equivariant Neural Networks: Focusing Equivariance On Transformations Co-Occurring In Data

arXiv.org Machine Learning

Equivariance is a nice property to have as it produces much more parameter efficient neural architectures and preserves the structure of the input through the feature mapping. Even though some combinations of transformations might never appear (e.g. an upright face with a horizontal nose), current equivariant architectures consider the set of all possible transformations in a transformation group when learning feature representations. Contrarily, the human visual system is able to attend to the set of relevant transformations occurring in the environment and utilizes this information to assist and improve object recognition. Based on this observation, we modify conventional equivariant feature mappings such that they are able to attend to the set of cooccurring transformations in data and generalize this notion to act on groups consisting of multiple symmetries. We show that our proposed co-attentive equivariant neural networks consistently outperform conventional rotation equivariant and rotation & reflection equivariant neural networks on rotated MNIST and CIFAR-10. Thorough experimentation in the fields of psychology and neuroscience has provided support to the intuition that our visual perception and cognition systems are able to identify familiar objects despite modifications in size, location, background, viewpoint and lighting (Bruce & Humphreys, 1994). Interestingly, we are not just able to recognize such modified objects, but are able to characterize which modifications have been applied to them as well. As an example, when we see a picture of a cat, we are not just able to tell that there is a cat in it, but also its position, its size, facts about the lighting conditions of the picture, and so forth. Such observations suggest that the human visual system is equivariant to a large transformation group containing translation, rotation, scaling, among others. In other words, the mental representation obtained by seeing a transformed version of an object, is equivalent to that of seeing the original object and transforming it mentally next. These fascinating abilities exhibited by biological visual systems have inspired a large field of research towards the development of neural architectures able to replicate them.


Coordinate-wise Armijo's condition

arXiv.org Machine Learning

Let $z=(x,y)$ be coordinates for the product space $\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}$. Let $f:\mathbb{R}^{m_1}\times \mathbb{R}^{m_2}\rightarrow \mathbb{R}$ be a $C^1$ function, and $\nabla f=(\partial _xf,\partial _yf)$ its gradient. Fix $0<\alpha <1$. For a point $(x,y) \in \mathbb{R}^{m_1}\times \mathbb{R}^{m_2}$, a number $\delta >0$ satisfies Armijo's condition at $(x,y)$ if the following inequality holds: \begin{eqnarray*} f(x-\delta \partial _xf,y-\delta \partial _yf)-f(x,y)\leq -\alpha \delta (||\partial _xf||^2+||\partial _yf||^2). \end{eqnarray*} When $f(x,y)=f_1(x)+f_2(y)$ is a coordinate-wise sum map, we propose the following {\bf coordinate-wise} Armijo's condition. Fix again $0<\alpha <1$. A pair of positive numbers $\delta _1,\delta _2>0$ satisfies the coordinate-wise variant of Armijo's condition at $(x,y)$ if the following inequality holds: \begin{eqnarray*} [f_1(x-\delta _1\nabla f_1(x))+f_2(y-\delta _2\nabla f_2(y))]-[f_1(x)+f_2(y)]\leq -\alpha (\delta _1||\nabla f_1(x)||^2+\delta _2||\nabla f_2(y)||^2). \end{eqnarray*} We then extend results in our recent previous results, on Backtracking Gradient Descent and some variants, to this setting. We show by an example the advantage of using coordinate-wise Armijo's condition over the usual Armijo's condition.


Drug Repurposing for Cancer: An NLP Approach to Identify Low-Cost Therapies

arXiv.org Machine Learning

More than 200 generic drugs approved by the U.S. Food and Drug Administration for non-cancer indications have shown promise for treating cancer. Due to their long history of safe patient use, low cost, and widespread availability, repurposing of generic drugs represents a major opportunity to rapidly improve outcomes for cancer patients and reduce healthcare costs worldwide. Evidence on the efficacy of non-cancer generic drugs being tested for cancer exists in scientific publications, but trying to manually identify and extract such evidence is intractable. In this paper, we introduce a system to automate this evidence extraction from PubMed abstracts. Our primary contribution is to define the natural language processing pipeline required to obtain such evidence, comprising the following modules: querying, filtering, cancer type entity extraction, therapeutic association classification, and study type classification. Using the subject matter expertise on our team, we create our own datasets for these specialized domain-specific tasks. We obtain promising performance in each of the modules by utilizing modern language modeling techniques and plan to treat them as baseline approaches for future improvement of individual components.