Goto

Collaborating Authors

 Country


Adversarial Robustness of Flow-Based Generative Models

arXiv.org Machine Learning

Flow-based generative models leverage invertible generator functions to fit a distribution to the training data using maximum likelihood. Despite their use in several application domains, robustness of these models to adversarial attacks has hardly been explored. In this paper, we study adversarial robustness of flow-based generative models both theoretically (for some simple models) and empirically (for more complex ones). First, we consider a linear flow-based generative model and compute optimal sample-specific and universal adversarial perturbations that maximally decrease the likelihood scores. Using this result, we study the robustness of the well-known adversarial training procedure, where we characterize the fundamental trade-off between model robustness and accuracy. Next, we empirically study the robustness of two prominent deep, non-linear, flow-based generative models, namely GLOW and RealNVP. We design two types of adversarial attacks; one that minimizes the likelihood scores of in-distribution samples, while the other that maximizes the likelihood scores of out-of-distribution ones. We find that GLOW and RealNVP are extremely sensitive to both types of attacks. Finally, using a hybrid adversarial training procedure, we significantly boost the robustness of these generative models.


Generate (non-software) Bugs to Fool Classifiers

arXiv.org Machine Learning

Let us consider a scenario in which an attacker wishes to modify input image x so that the target model f classifies it with the specific label t . The generation process can be represented as follows: ห† v argmin v L f ( x v,t) null nullv null, (1) where L f denotes a loss function that represents how distant the input data are from the given label under f and v null null v null is a norm function to regularize the perturbation so that v becomes unnoticeable to humans. Then, x ห† v is expected to form an adversarial example that is classified as t while it looks similar to x . Earlier approaches, such as Szegedy et al. (2014) and Moosavi-Dezfooli, Fawzi, and Frossard (2016), used L 2-norm to limit the magnitude of the perturbation. In contrast, Su, V argas, and Sakurai (2017) used L 0-norm to limit the number of modified pixels and showed that even modification of a one-pixel could generate adversarial examples. More recent studies introduced GAN instead of directly optimizing perturbations (Xiao et al. 2018; Zhao, Dua, and Singh 2018) for the purpose of ensuring the naturalness of adversarial examples. For example, Xiao et al. (2018) trained a discriminator network to distinguish adversarial examples from natural images so that the generator network produced adversarial examples that appeared as natural images. Given the distribution p x over the natural images and the tradeoff parameter ฮฑ, its training process can be represented similarly to that in Goodfellow et al. (2014) as follows: min G max D E x p x[log D (x)] E x p x[log (1 D ( x G ( x)))] ฮฑ E x p x[L f (x G (x),t)] .


Logic-inspired Deep Neural Networks

arXiv.org Machine Learning

Deep neural networks have achieved impressive performance and become de-facto standard in many tasks. However, phenomena such as adversarial examples and fooling examples hint that the generalization they make is flawed. We argue that the problem roots in their distributed and connected nature and propose remedies inspired by propositional logic. Our experiments show that the proposed models are more local and better at resisting fooling and adversarial examples. By means of an ablation analysis, we reveal insights into adversarial examples and suggest a new hypothesis on their origins.


Deep Anomaly Detection with Deviation Networks

arXiv.org Machine Learning

Although deep learning has been applied to successfully address many data mining problems, relatively limited work has been done on deep learning for anomaly detection. Existing deep anomaly detection methods, which focus on learning new feature representations to enable downstream anomaly detection methods, perform indirect optimization of anomaly scores, leading to data-inefficient learning and suboptimal anomaly scoring. Also, they are typically designed as unsupervised learning due to the lack of large-scale labeled anomaly data. As a result, they are difficult to leverage prior knowledge (e.g., a few labeled anomalies) when such information is available as in many real-world anomaly detection applications. This paper introduces a novel anomaly detection framework and its instantiation to address these problems. Instead of representation learning, our method fulfills an end-to-end learning of anomaly scores by a neural deviation learning, in which we leverage a few (e.g., multiple to dozens) labeled anomalies and a prior probability to enforce statistically significant deviations of the anomaly scores of anomalies from that of normal data objects in the upper tail. Extensive results show that our method can be trained substantially more data-efficiently and achieves significantly better anomaly scoring than state-of-the-art competing methods.


Robust Learning of Discrete Distributions from Batches

arXiv.org Machine Learning

Let $d$ be the lowest $L_1$ distance to which a $k$-symbol distribution $p$ can be estimated from $m$ batches of $n$ samples each, when up to $\beta m$ batches may be adversarial. For $\beta<1/2$, Qiao and Valiant (2017) showed that $d=\Omega(\beta/\sqrt{n})$ and requires $m=\Omega(k/\beta^2)$ batches. For $\beta<1/900$, they provided a $d$ and $m$ order-optimal algorithm that runs in time exponential in $k$. For $\beta<0.5$, we propose an algorithm with comparably optimal $d$ and $m$, but run-time polynomial in $k$ and all other parameters.


Gromov-Wasserstein Factorization Models for Graph Clustering

arXiv.org Machine Learning

We propose a new nonlinear factorization model for graphs that are with topological structures, and optionally, node attributes. This model is based on a pseudometric called Gromov-Wasserstein (GW) discrepancy, which compares graphs in a relational way. It estimates observed graphs as GW barycenters constructed by a set of atoms with different weights. By minimizing the GW discrepancy between each observed graph and its GW barycenter-based estimation, we learn the atoms and their weights associated with the observed graphs. The model achieves a novel and flexible factorization mechanism under GW discrepancy, in which both the observed graphs and the learnable atoms can be un-aligned and with different sizes. We design an effective approximate algorithm for learning this Gromov-Wasserstein factorization (GWF) model, unrolling loopy computations as stacked modules and computing gradients with backpropaga-tion. The stacked modules can be with two different architectures, which correspond to the proximal point algorithm (PP A) and Bregman alternating direction method of multipliers (BADMM), respectively. Experiments show that our model obtains encouraging results on clustering graphs. Introduction As an important methodology for machine learning, factorization models explore intrinsic structures of high-dimensional observations explicitly, which have been widely used in many learning tasks, e.g., data clustering (Ng, Jordan, and Weiss 2002), dimensionality reduction (Cand es et al. 2011), recommendation systems (Wang and Blei 2011), etc. In particular, factorization models decompose high-dimensional observations into a set of atoms under specific criteria and achieve their latent representations accordingly.


CASTER: Predicting Drug Interactions with Chemical Substructure Representation

arXiv.org Machine Learning

Adverse drug-drug interactions (DDIs) remain a leading cause of morbidity and mortality. Identifying potential DDIs during the drug design process is critical for patients and society. Although several computational models have been proposed for DDI prediction, there are still limitations: (1) specialized design of drug representation for DDI predictions is lacking; (2) predictions are based on limited labelled data and do not generalize well to unseen drugs or DDIs; and (3) models are characterized by a large number of parameters, thus are hard to interpret. In this work, we develop a C hemicA l S ubstrucT urE R epresentation ( CASTER) framework that predicts DDIs given chemical structures of drugs. CASTER aims to mitigate these limitations via (1) a sequential pattern mining module rooted in the DDI mechanism to efficiently characterize functional substructures of drugs; (2) an auto-encoding module that leverages both labelled and unlabelled chemical structure data to improve predictive accuracy and generalizability; and (3) a dictionary learning module that explains the prediction via a small set of coefficients which measure the relevance of each input substructures to the DDI outcome. We evaluated CASTER on two real-world DDI datasets and showed that it performed better than state-of-the-art baselines and provided interpretable predictions. 1 Introduction Adverse drug-drug interactions (DDIs) are caused by pharmacological interactions of drugs. They result in a large number of morbidity and mortality, and incur huge medical costs (Giacomini et al. 2007; Onakpoya, Heneghan, and Aronson 2016).


Learning internal representations

arXiv.org Machine Learning

Probably the most important problem in machine learning is the preliminary biasing of a learner's hypothesis space so that it is small enough to ensure good generalisation from reasonable training sets, yet large enough that it contains a good solution to the problem being learnt. In this paper a mechanism for {\em automatically} learning or biasing the learner's hypothesis space is introduced. It works by first learning an appropriate {\em internal representation} for a learning environment and then using that representation to bias the learner's hypothesis space for the learning of future tasks drawn from the same environment. An internal representation must be learnt by sampling from {\em many similar tasks}, not just a single task as occurs in ordinary machine learning. It is proved that the number of examples $m$ {\em per task} required to ensure good generalisation from a representation learner obeys $m = O(a+b/n)$ where $n$ is the number of tasks being learnt and $a$ and $b$ are constants. If the tasks are learnt independently ({\em i.e.} without a common representation) then $m=O(a+b)$. It is argued that for learning environments such as speech and character recognition $b\gg a$ and hence representation learning in these environments can potentially yield a drastic reduction in the number of examples required per task. It is also proved that if $n = O(b)$ (with $m=O(a+b/n)$) then the representation learnt will be good for learning novel tasks from the same environment, and that the number of examples required to generalise well on a novel task will be reduced to $O(a)$ (as opposed to $O(a+b)$ if no representation is used). It is shown that gradient descent can be used to train neural network representations and experiment results are reported providing strong qualitative support for the theoretical results.


Deep Unsupervised Clustering with Clustered Generator Model

arXiv.org Machine Learning

However, unsupervised clustering remains one of the most fundamental challenges in machine learning because of high dimensionality of data and high complexities of their hidden structures. Long-established approaches for unsupervised clustering including K-means [15] and Gaussian Mixture Model (GMM) [3] are still the building blocks for numerous applications due to their efficiency and simplicity. However, their distance metrics are limited to data space, making them ineffective for high-dimensional data such as images. Therefore, considerable efforts have been put into obtaining a good feature embedding of data, usually of low dimensionality, for effective clustering [37]. However, the representation obtained by standalone data embedding typically can-Tian Han is the corresponding author not capture the latent structure and variation of the observed data which may be ineffective for clustering. We believe the good representation for clustering should also be able to compactly represent the observed data distribution to encode all necessary characteristics of the observation. Deep generative models (a.k.a the generator models) have shown great promise in learning latent representations for high-dimensional signals such as images and videos [32, 24, 11]. Generator models parameterized by deep neural networks specify a nonlinear mapping from latent variables to observed data.


A Study on various state of the art of the Art Face Recognition System using Deep Learning Techniques

arXiv.org Machine Learning

ABSTRACT Considering the existence of very large amount of available data repositories and reach to the very advanced system of hardware, systems meant for facial identification have evolved enormously over the past few decades. Sketch recognitio n is one of the most important areas that have evolved as an integral component adopted by the agencies of law administration in curren t trends of forensic science. Matching of derived sketches to photo images of face is also a difficult assignment as the considered sketches are produced upon the verbal explanation depicted by the eye witness of the crime scene and may have scarcity of se nsitive elements that exist in the photograph as one can accurately depict due to the natural human error. Substantial amount of the novel research work carried out in this area up late used recognition system through traditional extraction and classificat ion models . But very recently, few researches work focused on using deep learning techniques to take an advantage of learning models for the feature extraction and classification to rule out potential domain challenges. The first part of this review paper basically focuses on deep learning techniques used in face recognition and matching which as improved the accuracy of face recognition technique with training of huge sets of data. This paper also includes a survey on different techniques used to match com posite sketches to human images which includes component - based representation approach, automatic composite sketch recognition technique etc. INTRODUCTION As per the researches carried out, a complete face recognition system includes two patterns of face detection and face recognition: 1) Structural similarity and 2) individual local differences of human faces. Therefore, it is required to extract the features of the face through the face detection process. The evolution of face recognition is due to its technical challenges and huge potential application in video surveillance, identity authorization, multimedia applications, home and office security, law enforcement and different human - computer interaction activities. Facial recognition technology (FRT) is one of the most controversial new tools. It was first devel oped in the 1960s.