Recent advances in the estimation of deep directed graphical models and recurrent networks let us contribute to the removal of a blind spot in the area of probabilistc modelling of time series. The proposed methods i) can infer distributed latent state-space trajectories with nonlinear transitions, ii) scale to large data sets thanks to the use of a stochastic objective and fast, approximate inference, iii) enable the design of rich emission models which iv) will naturally lead to structured outputs. Two different paths of introducing latent state sequences are pursued, leading to the variational recurrent auto encoder (VRAE) and the variational one step predictor (VOSP). The use of independent Wiener processes as priors on the latent state sequence is a viable compromise between efficient computation of the Kullback-Leibler divergence from the variational approximation of the posterior and maintaining a reasonable belief in the dynamics. We verify our methods empirically, obtaining results close or superior to the state of the art. We also show qualitative results for denoising and missing value imputation.
Variational inference for latent variable models is prevalent in various machine learning problems, typically solved by maximizing the Evidence Lower Bound (ELBO) of the true data likelihood with respect to a variational distribution. However, freely enriching the family of variational distribution is challenging since the ELBO requires variational likelihood evaluations of the latent variables. In this paper, we propose a novel framework to enrich the variational family based on an alternative lower bound, by introducing auxiliary random variables to the variational distribution only. While offering a much richer family of complex variational distributions, the resulting inference network is likelihood almost free in the sense that only the latent variables require evaluations from simple likelihoods and samples from all the auxiliary variables are sufficient for maximum likelihood inference. We show that the proposed approach is essentially optimizing a probabilistic mixture of ELBOs, thus enriching modeling capacity and enhancing robustness. It outperforms state-of-the-art methods in our experiments on several density estimation tasks.
A wide variety of Dirichlet-multinomial'topic' models have found interesting applications inrecent years. While Gibbs sampling remains an important method of inference in such models, variational techniques have certain advantages such as easy assessment of convergence, easy optimization without the need to maintain detailed balance, a bound on the marginal likelihood, and sidestepping of issues with topic-identifiability. The most accurate variational technique thus far, namely collapsed variational latent Dirichlet allocation, did not deal with model selection nor did it include inference for hyperparameters. We address both issues by generalizing thetechnique, obtaining the first variational algorithm to deal with the hierarchical Dirichlet process and to deal with hyperparameters of Dirichlet variables.
Beta process is the standard nonparametric Bayesian prior for latent factor model. In this paper, we derive a structured mean-field variational inference algorithm for a beta process non-negative matrix factorization (NMF) model with Poisson likelihood. Unlike the linear Gaussian model, which is well-studied in the nonparametric Bayesian literature, NMF model with beta process prior does not enjoy the conjugacy. We leverage the recently developed stochastic structured mean-field variational inference to relax the conjugacy constraint and restore the dependencies among the latent variables in the approximating variational distribution. Preliminary results on both synthetic and real examples demonstrate that the proposed inference algorithm can reasonably recover the hidden structure of the data.
We would like to learn latent representations that are low-dimensional and highly interpretable. A model that has these characteristics is the Gaussian Process Latent Variable Model. The benefits and negative of the GP-LVM are complementary to the Variational Autoencoder, the former provides interpretable low-dimensional latent representations while the latter is able to handle large amounts of data and can use non-Gaussian likelihoods. Our inspiration for this paper is to marry these two approaches and reap the benefits of both. In order to do so we will introduce a novel approximate inference scheme inspired by the GP-LVM and the VAE. We show experimentally that the approximation allows the capacity of the generative bottle-neck (Z) of the VAE to be arbitrarily large without losing a highly interpretable representation, allowing reconstruction quality to be unlimited by Z at the same time as a low-dimensional space can be used to perform ancestral sampling from as well as a means to reason about the embedded data.