Goto

Collaborating Authors

 Generative AI


Semi-Unsupervised Learning with Deep Generative Models: Clustering and Classifying using Ultra-Sparse Labels

arXiv.org Machine Learning

We introduce $\textit{semi-unsupervised learning}$, an extreme case of semi-supervised learning with ultra-sparse categorisation where some classes have no labels in the training set. That is, in the training data some classes are sparsely labelled and other classes appear only as unlabelled data. Many real-world datasets are conceivably of this type. We demonstrate that effective learning in this regime is only possible when a model is capable of capturing both semi-supervised and unsupervised learning. We develop two deep generative models for classification in this regime that extend previous deep generative models designed for semi-supervised learning. By changing their probabilistic structure to contain a mixture of Gaussians in their continuous latent space, these new models can learn in both unsupervised and semi-unsupervised paradigms. We demonstrate their performance both for semi-unsupervised and unsupervised learning on various standard datasets. We show that our models can learn in an semi-unsupervised manner on Fashion-MNIST. Here we artificially mask out all labels for half of the classes of data and keep $2\%$ of labels for the remaining classes. Our model is able to learn effectively, obtaining a trained classifier with $(77.2\pm1.3)\%$ test set accuracy. We also can train on Fashion-MNIST unsupervised, obtaining $(75.2\pm1.5)\%$ test set accuracy. Additionally, doing the same for MNIST unsupervised we get $(96.3\pm0.9)\%$ test set accuracy, which is state-of-the art for fully probabilistic deep generative models.


Data Interpolations in Deep Generative Models under Non-Simply-Connected Manifold Topology

arXiv.org Machine Learning

Exploiting the deep generative model's remarkable ability of learning the data-manifold structure, some recent researches proposed a geometric data interpolation method based on the geodesic curves on the learned data-manifold. However, this interpolation method often gives poor results due to a topological difference between the model and the dataset. The model defines a family of simply-connected manifolds, whereas the dataset generally contains disconnected regions or holes that make them non-simply-connected. To compensate this difference, we propose a novel density regularizer that make the interpolation path circumvent the holes denoted by low probability density. We confirm that our method gives consistently better interpolation results from the experiments with real-world image datasets.


Kernel Change-point Detection with Auxiliary Deep Generative Models

arXiv.org Machine Learning

Detecting the emergence of abrupt property changes in time series is a challenging problem. Kernel two-sample test has been studied for this task which makes fewer assumptions on the distributions than traditional parametric approaches. However, selecting kernels is nontrivial in practice. Although kernel selection for two-sample test has been studied, the insufficient samples in change point detection problem hinders the success of those developed kernel selection algorithms. In this paper, we propose KL-CPD, a novel kernel learning framework for time series CPD that optimizes a lower bound of test power via an auxiliary generative model. With deep kernel parameterization, KL-CPD endows kernel two-sample test with the data-driven kernel to detect different types of change-points in real-world applications. The proposed approach significantly outperformed other state-of-the-art methods in our comparative evaluation of benchmark datasets and simulation studies. Detecting changes in the temporal evolution of a system (biological, physical, mechanical, etc.) in time series analysis has attracted considerable attention in machine learning and data mining for decades (Basseville et al., 1993; Brodsky & Darkhovsky, 2013). This task, commonly referred to as change-point detection (CPD) or anomaly detection in the literature, aims to predict significant changing points in a temporal sequence of observations.


A Deep Generative Model for Graphs: Supervised Subset Selection to Create Diverse Realistic Graphs with Applications to Power Networks Synthesis

arXiv.org Machine Learning

Creating and modeling real-world graphs is a crucial problem in various applications of engineering, biology, and social sciences; however, learning the distributions of nodes/edges and sampling from them to generate realistic graphs is still challenging. Moreover, generating a diverse set of synthetic graphs that all imitate a real network is not addressed. In this paper, the novel problem of creating diverse synthetic graphs is solved. First, we devise the deep supervised subset selection (DeepS3) algorithm; Given a ground-truth set of data points, DeepS3 selects a diverse subset of all items (i.e. data points) that best represent the items in the ground-truth set. Furthermore, we propose the deep graph representation recurrent network (GRRN) as a novel generative model that learns a probabilistic representation of a real weighted graph. Training the GRRN, we generate a large set of synthetic graphs that are likely to follow the same features and adjacency patterns as the original one. Incorporating GRRN with DeepS3, we select a diverse subset of generated graphs that best represent the behaviors of the real graph (i.e. our ground-truth). We apply our model to the novel problem of power grid synthesis, where a synthetic power network is created with the same physical/geometric properties as a real power system without revealing the real locations of the substations (nodes) and the lines (edges), since such data is confidential. Experiments on the Synthetic Power Grid Data Set show accurate synthetic networks that follow similar structural and spatial properties as the real power grid.


Memory Augmented Deep Generative models for Forecasting the Next Shot Location in Tennis

arXiv.org Machine Learning

Considering the fact that present day ball speeds exceed 130mph, the time required by the receiver to make a decision regarding the opponents' intention, and initiate a response could exceed the flight time for the ball [1], [2], [3], [4]. Several studies have shown that this reactive ability is the product of pattern recognition skills that are obtained through a "biological probabilistic engine", that derives theories regardingopponents intentions with the partial information available[1], [5], [6]. For instance, it has been shown that expert tennis players are better at detecting events in advance [1], [7] and posses better knowledge/ expertise of situational probabilities [3]. Further investigation of human neurological structures have revealed that those capabilities occur due to a bottom-up computational process [1] within the human brain, from sensory memory to the experiences stored in episodic memory [8], [9] and knowledge derived in semantic memory [9], [10]. Despite the growing interest among researchers in the machine learning domain in better understanding factors influencing decision making in fastball sports, there have been very few studies transferring the observations of the underlying neural mechanisms to neural modelling in machine learning.Current state-of-the-art methodologies try to capture the underlying semantics through a handful of handcrafted features, without paying attention to essential mechanisms in the human brain, where the expertise and observations are stored and knowledge is derived.


Learning semantic similarity in a continuous space

Neural Information Processing Systems

We address the problem of learning semantic representation of questions to measure similarity between pairs as a continuous distance metric. Our work naturally extends Word Mover's Distance (WMD) [1] by representing text documents as normal distributions instead of bags of embedded words. Our learned metric measures the dissimilarity between two questions as the minimum amount of distance the intent (hidden representation) of one question needs to "travel" to match the intent of another question. We first learn to repeat, reformulate questions to infer intents as normal distributions with a deep generative model [2] (variational auto encoder). Semantic similarity between pairs is then learned discriminatively as an optimal transport distance metric (Wasserstein 2) with our novel variational siamese framework. Among known models that can read sentences individually, our proposed framework achieves competitive results on Quora duplicate questions dataset. Our work sheds light on how deep generative models can approximate distributions (semantic representations) to effectively measure semantic similarity with meaningful distance metrics from Information Theory.


Flexible and accurate inference and learning for deep generative models

Neural Information Processing Systems

We introduce a new approach to learning in hierarchical latent-variable generative models called the โ€œdistributed distributional code Helmholtz machineโ€, which emphasises flexibility and accuracy in the inferential process. Like the original Helmholtz machine and later variational autoencoder algorithms (but unlike adver- sarial methods) our approach learns an explicit inference or โ€œrecognitionโ€ model to approximate the posterior distribution over the latent variables. Unlike these earlier methods, it employs a posterior representation that is not limited to a narrow tractable parametrised form (nor is it represented by samples). To train the genera- tive and recognition models we develop an extended wake-sleep algorithm inspired by the original Helmholtz machine. This makes it possible to learn hierarchical latent models with both discrete and continuous variables, where an accurate poste- rior representation is essential. We demonstrate that the new algorithm outperforms current state-of-the-art methods on synthetic, natural image patch and the MNIST data sets.


Deep Generative Models for Distribution-Preserving Lossy Compression

Neural Information Processing Systems

We propose and study the problem of distribution-preserving lossy compression. Motivated by recent advances in extreme image compression which allow to maintain artifact-free reconstructions even at very low bitrates, we propose to optimize the rate-distortion tradeoff under the constraint that the reconstructed samples follow the distribution of the training data. The resulting compression system recovers both ends of the spectrum: On one hand, at zero bitrate it learns a generative model of the data, and at high enough bitrates it achieves perfect reconstruction. Furthermore, for intermediate bitrates it smoothly interpolates between learning a generative model of the training data and perfectly reconstructing the training samples. We study several methods to approximately solve the proposed optimization problem, including a novel combination of Wasserstein GAN and Wasserstein Autoencoder, and present an extensive theoretical and empirical characterization of the proposed compression systems.


Semi-crowdsourced Clustering with Deep Generative Models

Neural Information Processing Systems

We consider the semi-supervised clustering problem where crowdsourcing provides noisy information about the pairwise comparisons on a small subset of data, i.e., whether a sample pair is in the same cluster. We propose a new approach that includes a deep generative model (DGM) to characterize low-level features of the data, and a statistical relational model for noisy pairwise annotations on its subset. The two parts share the latent variables. To make the model automatically trade-off between its complexity and fitting data, we also develop its fully Bayesian variant. The challenge of inference is addressed by fast (natural-gradient) stochastic variational inference algorithms, where we effectively combine variational message passing for the relational part and amortized learning of the DGM under a unified framework. Empirical results on synthetic and real-world datasets show that our model outperforms previous crowdsourced clustering methods.


Bias and Generalization in Deep Generative Models: An Empirical Study

Neural Information Processing Systems

In high dimensional settings, density estimation algorithms rely crucially on their inductive bias. Despite recent empirical success, the inductive bias of deep generative models is not well understood. In this paper we propose a framework to systematically investigate bias and generalization in deep generative models of images by probing the learning algorithm with carefully designed training datasets. By measuring properties of the learned distribution, we are able to find interesting patterns of generalization. We verify that these patterns are consistent across datasets, common models and architectures.