Goto

Collaborating Authors

 Unsupervised or Indirectly Supervised Learning


Reviews: Semi-Supervised Learning with Declaratively Specified Entropy Constraints

Neural Information Processing Systems

This paper proposes a method to combine (or ensemble) several SSL heuristics (regularizers) by using a Bayesian optimization approach. The basic idea of the proposed method borrowed from the previous method called D-Learner, which is declared in this paper. Therefore, the proposed method is basically a modification or extension of D-Learner, which seems not to be totally novel. In this perspective, this paper is rather incremental than innovative. The experimental results look fairly well comparing with the methods in previous studies including the baseline D-Learner on the tasks of text classification and relation extraction examined in this paper.


Multivariate Time Series Imputation with Generative Adversarial Networks

Neural Information Processing Systems

Multivariate time series usually contain a large number of missing values, which hinders the application of advanced analysis methods on multivariate time series data. Conventional approaches to addressing the challenge of missing values, including mean/zero imputation, case deletion, and matrix factorization-based imputation, are all incapable of modeling the temporal dependencies and the nature of complex distribution in multivariate time series. In this paper, we treat the problem of missing value imputation as data generation. Inspired by the success of Generative Adversarial Networks (GAN) in image generation, we propose to learn the overall distribution of a multivariate time series dataset with GAN, which is further used to generate the missing values for each sample. Different from the image data, the time series data are usually incomplete due to the nature of data recording process.


The Sample Complexity of Semi-Supervised Learning with Nonparametric Mixture Models

Neural Information Processing Systems

We study the sample complexity of semi-supervised learning (SSL) and introduce new assumptions based on the mismatch between a mixture model learned from unlabeled data and the true mixture model induced by the (unknown) class conditional distributions. Under these assumptions, we establish an \Omega(K\log K) labeled sample complexity bound without imposing parametric assumptions, where K is the number of classes. Our results suggest that even in nonparametric settings it is possible to learn a near-optimal classifier using only a few labeled samples. Unlike previous theoretical work which focuses on binary classification, we consider general multiclass classification ( K 2), which requires solving a difficult permutation learning problem. This permutation defines a classifier whose classification error is controlled by the Wasserstein distance between mixing measures, and we provide finite-sample results characterizing the behaviour of the excess risk of this classifier.


Supervising Unsupervised Learning

Neural Information Processing Systems

We introduce a framework to transfer knowledge acquired from a repository of (heterogeneous) supervised datasets to new unsupervised datasets. Our perspective avoids the subjectivity inherent in unsupervised learning by reducing it to supervised learning, and provides a principled way to evaluate unsupervised algorithms. We demonstrate the versatility of our framework via rigorous agnostic bounds on a variety of unsupervised problems. In the context of clustering, our approach helps choose the number of clusters and the clustering algorithm, remove the outliers, and provably circumvent Kleinberg's impossibility result. Experiments across hundreds of problems demonstrate improvements in performance on unsupervised data with simple algorithms despite the fact our problems come from heterogeneous domains.


GLoMo: Unsupervised Learning of Transferable Relational Graphs

Neural Information Processing Systems

Modern deep transfer learning approaches have mainly focused on learning generic feature vectors from one task that are transferable to other tasks, such as word embeddings in language and pretrained convolutional features in vision. However, these approaches usually transfer unary features and largely ignore more structured graphical representations. This work explores the possibility of learning generic latent relational graphs that capture dependencies between pairs of data units (e.g., words or pixels) from large-scale unlabeled data and transferring the graphs to downstream tasks. Our proposed transfer learning framework improves performance on various tasks including question answering, natural language inference, sentiment analysis, and image classification. We also show that the learned graphs are generic enough to be transferred to different embeddings on which the graphs have not been trained (including GloVe embeddings, ELMo embeddings, and task-specific RNN hidden units), or embedding-free units such as image pixels.


Unsupervised Learning of View-invariant Action Representations

Neural Information Processing Systems

The recent success in human action recognition with deep learning methods mostly adopt the supervised learning paradigm, which requires significant amount of manually labeled data to achieve good performance. However, label collection is an expensive and time-consuming process. In this work, we propose an unsupervised learning framework, which exploits unlabeled data to learn video representations. Different from previous works in video representation learning, our unsupervised learning task is to predict 3D motion in multiple target views using video representation from a source view. By learning to extrapolate cross-view motions, the representation can capture view-invariant motion dynamics which is discriminative for the action.


Reviews: Semi-supervised Learning with GANs: Manifold Invariance with Improved Inference

Neural Information Processing Systems

The author(s) extend the idea of regularizing classifiers to be invariant to the tangent space of the learned manifold of the data to use GAN based architectures. This is a worthwhile idea to revisit as significant advances have been made in generative modeling in the intervening time since the last major paper in the area, the CAE was published. Crucial to the idea is the existence of an encoder learning an inverse mapping of the standard generator of GAN training. This is still an area of active research in the GAN literature that as of yet has no completely satisfactory approach. As current inference techniques for GANs are still quite poor, the authors propose two improvements to one technique, BiGAN, which are worthwhile contributions. 1) They adopt the feature matching loss proposed in "Improved techniques for training gans" and 2) they augment the BiGAN objective with another term that evaluates how the generator maps the inferred latent code for a given real example.


Reviews: Unsupervised learning of object frames by dense equivariant image labelling

Neural Information Processing Systems

Blue565 Unsupervised object learning from dense equivariant image labelling An impressive paper, marred by flaws in exposition, all fixable. The aim is to construct an object representation from multiple images, with dense labelling functions (from image to object), without supervision. Experiments seem to be very successful, though the paper would be improved by citing (somewhat) comparable numerical results on the MAFL dataset. The method is conceptually simple, which is a plus. The review of related methods seems good, though I admit to not knowing the field well enough to know what has been missed.


Reviews: Structured Generative Adversarial Networks

Neural Information Processing Systems

Summary: This paper proposes a novel GAN structure for semi-supervised learning, a setting in which there exist a small dataset with class labels along with a larger unlabeled dataset. The main idea of this paper is to disentangle the labels (y) from the hidden states (z) using two GAN problems that represent p(x,y) and p(x,z). The generator is shared between both GAN problems, but each problem is trained simultaneously using ALI[4]. There are two adversarial games defined for training the joints p(x, y) and p(x, z). Two "collaborative games" are also defined in order to better disentangle y from z and enforce structure on y.


Reviews: Triangle Generative Adversarial Networks

Neural Information Processing Systems

Most importantly, I agree that the characterization of Triple GAN is somewhat misleading. The current paper should clarify that Triangle GAN fits a model to p_y(y x) rather than this density being required as given. The toy experiment should note that p_y(y x) in Triple GAN could be modeled as a mixture of Gaussians, although it is preferable that Triangle GAN does not require specifying this. The objective comes down to conditional GAN BiGAN/ALI. That is an intuitive and perhaps simple thing to try for the semi-supervised setting, but it's nice that this paper backs up the formulation with theory about behavior at optimality.