Density modeling is notoriously difficult for high dimensional data. One approach to the problem is to search for a lower dimensional manifold which captures the main characteristics of the data. Recently, the Gaussian Process Latent Variable Model (GPLVM) has successfully been used to find low dimensional manifolds in a variety of complex data. The GPLVM consists of a set of points in a low dimensional latent space, and a stochastic map to the observed space. We show how it can be interpreted as a density model in the observed space. However, the GPLVM is not trained as a density model and therefore yields bad density estimates. We propose a new training strategy and obtain improved generalisation performance and better density estimates in comparative evaluations on several benchmark data sets.
Logistic Gaussian process (LGP) priors provide a flexible alternative for modelling unknown densities. The smoothness properties of the density estimates can be controlled through the prior covariance structure of the LGP, but the challenge is the analytically intractable inference. In this paper, we present approximate Bayesian inference for LGP density estimation in a grid using Laplace's method to integrate over the non-Gaussian posterior distribution of latent function values and to determine the covariance function parameters with type-II maximum a posteriori (MAP) estimation. We demonstrate that Laplace's method with MAP is sufficiently fast for practical interactive visualisation of 1D and 2D densities. Our experiments with simulated and real 1D data sets show that the estimation accuracy is close to a Markov chain Monte Carlo approximation and state-of-the-art hierarchical infinite Gaussian mixture models. We also construct a reduced-rank approximation to speed up the computations for dense 2D grids, and demonstrate density regression with the proposed Laplace approach.
This note is concerned with accurate and computationally efficient approximations of moments of Gaussian random variables passed through sigmoid or softmax mappings. These approximations are semi-analytical (i.e. they involve the numerical adjustment of parametric forms) and highly accurate (they yield 5% error at most). We also highlight a few niche applications of these approximations, which arise in the context of, e.g., drift-diffusion models of decision making or non-parametric data clustering approaches. We provide these as examples of efficient alternatives to more tedious derivations that would be needed if one was to approach the underlying mathematical issues in a more formal way. We hope that this technical note will be helpful to modellers facing similar mathematical issues, although maybe stemming from different academic prospects.
Gaussian Mixture Models (GMM) have found many applications in density estimation and data clustering. However, the model does not adapt well to curved and strongly nonlinear data. Recently there appeared an improvement called AcaGMM (Active curve axis Gaussian Mixture Model), which fits Gaussians along curves using an EM-like (Expectation Maximization) approach. Using the ideas standing behind AcaGMM, we build an alternative active function model of clustering, which has some advantages over AcaGMM. In particular it is naturally defined in arbitrary dimensions and enables an easy adaptation to clustering of complicated datasets along the predefined family of functions. Moreover, it does not need external methods to determine the number of clusters as it automatically reduces the number of groups on-line.
Conditional Density Estimation (CDE) models deal with estimating conditional distributions. The conditions imposed on the distribution are the inputs of the model. CDE is a challenging task as there is a fundamental trade-off between model complexity, representational capacity and overfitting. In this work, we propose to extend the model's input with latent variables and use Gaussian processes (GP) to map this augmented input onto samples from the conditional distribution. Our Bayesian approach allows for the modeling of small datasets, but we also provide the machinery for it to be applied to big data using stochastic variational inference.