Kumar, Abhishek
Combating Reinforcement Learning's Sisyphean Curse with Intrinsic Fear
Lipton, Zachary C., Azizzadenesheli, Kamyar, Kumar, Abhishek, Li, Lihong, Gao, Jianfeng, Deng, Li
Many practical environments contain catastrophic states that an optimal agent would visit infrequently or never. Even on toy problems, Deep Reinforcement Learning (DRL) agents tend to periodically revisit these states upon forgetting their existence under a new policy. We introduce intrinsic fear (IF), a learned reward shaping that guards DRL agents against periodic catastrophes. IF agents possess a fear model trained to predict the probability of imminent catastrophe. This score is then used to penalize the Q-learning objective. Our theoretical analysis bounds the reduction in average return due to learning on the perturbed objective. We also prove robustness to classification errors. As a bonus, IF models tend to learn faster, owing to reward shaping. Experiments demonstrate that intrinsic-fear DQNs solve otherwise pathological environments and improve on several Atari games.
Semi-supervised Learning with GANs: Manifold Invariance with Improved Inference
Kumar, Abhishek, Sattigeri, Prasanna, Fletcher, Tom
Semi-supervised learning methods using Generative adversarial networks (GANs) have shown promising empirical success recently. Most of these methods use a shared discriminator/classifier which discriminates real examples from fake while also predicting the class label. Motivated by the ability of the GANs generator to capture the data manifold well, we propose to estimate the tangent space to the data manifold using GANs and employ it to inject invariances into the classifier. In the process, we propose enhancements over existing methods for learning the inverse mapping (i.e., the encoder) which greatly improves in terms of semantic similarity of the reconstructed sample with the input sample. We observe considerable empirical gains in semi-supervised learning over baselines, particularly in the cases when the number of labeled examples is low. We also provide insights into how fake examples influence the semi-supervised learning procedure.
Semi-supervised Learning with GANs: Manifold Invariance with Improved Inference
Kumar, Abhishek, Sattigeri, Prasanna, Fletcher, P. Thomas
Semi-supervised learning methods using Generative Adversarial Networks (GANs) have shown promising empirical success recently. Most of these methods use a shared discriminator/classifier which discriminates real examples from fake while also predicting the class label. Motivated by the ability of the GANs generator to capture the data manifold well, we propose to estimate the tangent space to the data manifold using GANs and employ it to inject invariances into the classifier. In the process, we propose enhancements over existing methods for learning the inverse mapping (i.e., the encoder) which greatly improves in terms of semantic similarity of the reconstructed sample with the input sample. We observe considerable empirical gains in semi-supervised learning over baselines, particularly in the cases when the number of labeled examples is low. We also provide insights into how fake examples influence the semi-supervised learning procedure.
The Riemannian Geometry of Deep Generative Models
Shao, Hang, Kumar, Abhishek, Fletcher, P. Thomas
Deep generative models learn a mapping from a low dimensional latent space to a high-dimensional data space. Under certain regularity conditions, these models parameterize nonlinear manifolds in the data space. In this paper, we investigate the Riemannian geometry of these generated manifolds. First, we develop efficient algorithms for computing geodesic curves, which provide an intrinsic notion of distance between points on the manifold. Second, we develop an algorithm for parallel translation of a tangent vector along a path on the manifold. We show how parallel translation can be used to generate analogies, i.e., to transport a change in one data point into a semantically similar change of another data point. Our experiments on real image data show that the manifolds learned by deep generative models, while nonlinear, are surprisingly close to zero curvature. The practical implication is that linear paths in the latent space closely approximate geodesics on the generated manifold. However, further investigation into this phenomenon is warranted, to identify if there are other architectures or datasets where curvature plays a more prominent role. We believe that exploring the Riemannian geometry of deep generative models, using the tools developed in this paper, will be an important step in understanding the high-dimensional, nonlinear spaces these models learn.
Variational Inference of Disentangled Latent Concepts from Unlabeled Observations
Kumar, Abhishek, Sattigeri, Prasanna, Balakrishnan, Avinash
Disentangled representations, where the higher level data generative factors are reflected in disjoint latent dimensions, offer several benefits such as ease of deriving invariant representations, transferability to other tasks, interpretability, etc. We consider the problem of unsupervised learning of disentangled representations from large pool of unlabeled observations, and propose a variational inference based approach to infer disentangled latent factors. We introduce a regularizer on the expectation of the approximate posterior over observed data that encourages the disentanglement. We evaluate the proposed approach using several quantitative metrics and empirically observe significant gains over existing methods in terms of both disentanglement and data likelihood (reconstruction quality).
SenGen: Sentence Generating Neural Variational Topic Model
Nallapati, Ramesh, Melnyk, Igor, Kumar, Abhishek, Zhou, Bowen
We present a new topic model that generates documents by sampling a topic for one whole sentence at a time, and generating the words in the sentence using an RNN decoder that is conditioned on the topic of the sentence. We argue that this novel formalism will help us not only visualize and model the topical discourse structure in a document better, but also potentially lead to more interpretable topics since we can now illustrate topics by sampling representative sentences instead of bag of words or phrases. We present a variational auto-encoder approach for learning in which we use a factorized variational encoder that independently models the posterior over topical mixture vectors of documents using a feed-forward network, and the posterior over topic assignments to sentences using an RNN. Our preliminary experiments on two different datasets indicate early promise, but also expose many challenges that remain to be addressed.
Local Group Invariant Representations via Orbit Embeddings
Raj, Anant, Kumar, Abhishek, Mroueh, Youssef, Fletcher, P. Thomas, Schรถlkopf, Bernhard
Invariance to nuisance transformations is one of the desirable properties of effective representations. We consider transformations that form a \emph{group} and propose an approach based on kernel methods to derive local group invariant representations. Locality is achieved by defining a suitable probability distribution over the group which in turn induces distributions in the input feature space. We learn a decision function over these distributions by appealing to the powerful framework of kernel methods and generate local invariant random feature maps via kernel approximations. We show uniform convergence bounds for kernel approximation and provide excess risk bounds for learning with these features. We evaluate our method on three real datasets, including Rotated MNIST and CIFAR-10, and observe that it outperforms competing kernel based approaches. The proposed method also outperforms deep CNN on Rotated-MNIST and performs comparably to the recently proposed group-equivariant CNN.
Exact and Heuristic Algorithms for Semi-Nonnegative Matrix Factorization
Gillis, Nicolas, Kumar, Abhishek
Given a matrix $M$ (not necessarily nonnegative) and a factorization rank $r$, semi-nonnegative matrix factorization (semi-NMF) looks for a matrix $U$ with $r$ columns and a nonnegative matrix $V$ with $r$ rows such that $UV$ is the best possible approximation of $M$ according to some metric. In this paper, we study the properties of semi-NMF from which we develop exact and heuristic algorithms. Our contribution is threefold. First, we prove that the error of a semi-NMF of rank $r$ has to be smaller than the best unconstrained approximation of rank $r-1$. This leads us to a new initialization procedure based on the singular value decomposition (SVD) with a guarantee on the quality of the approximation. Second, we propose an exact algorithm (that is, an algorithm that finds an optimal solution), also based on the SVD, for a certain class of matrices (including nonnegative irreducible matrices) from which we derive an initialization for matrices not belonging to that class. Numerical experiments illustrate that this second approach performs extremely well, and allows us to compute optimal semi-NMF decompositions in many situations. Finally, we analyze the computational complexity of semi-NMF proving its NP-hardness, already in the rank-one case (that is, for $r = 1$), and we show that semi-NMF is sometimes ill-posed (that is, an optimal solution does not exist).
Near-separable Non-negative Matrix Factorization with $\ell_1$- and Bregman Loss Functions
Kumar, Abhishek, Sindhwani, Vikas
Recently, a family of tractable NMF algorithms have been proposed under the assumption that the data matrix satisfies a separability condition Donoho & Stodden (2003); Arora et al. (2012). Geometrically, this condition reformulates the NMF problem as that of finding the extreme rays of the conical hull of a finite set of vectors. In this paper, we develop several extensions of the conical hull procedures of Kumar et al. (2013) for robust ($\ell_1$) approximations and Bregman divergences. Our methods inherit all the advantages of Kumar et al. (2013) including scalability and noise-tolerance. We show that on foreground-background separation problems in computer vision, robust near-separable NMFs match the performance of Robust PCA, considered state of the art on these problems, with an order of magnitude faster training time. We also demonstrate applications in exemplar selection settings.
Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression
Rai, Piyush, Kumar, Abhishek, Daume, Hal
Multiple-output regression models require estimating multiple functions, one for each output. To improve parameter estimation in such models, methods based on structural regularization of the model parameters are usually needed. In this paper, we present a multiple-output regression model that leverages the covariance structure of the functions (i.e., how the multiple functions are related with each other) as well as the conditional covariance structure of the outputs. This is in contrast with existing methods that usually take into account only one of these structures. More importantly, unlike most of the other existing methods, none of these structures need be known a priori in our model, and are learned from the data. Several previously proposed structural regularization based multiple-output regression models turn out to be special cases of our model. Moreover, in addition to being a rich model for multiple-output regression, our model can also be used in estimating the graphical model structure of a set of variables (multivariate outputs) conditioned on another set of variables (inputs). Experimental results on both synthetic and real datasets demonstrate the effectiveness of our method.