Not enough data to create a plot.
Try a different view from the menu above.
Immer, Alexander
Variational Inference with Numerical Derivatives: variance reduction through coupling
Immer, Alexander, Dehaene, Guillaume P.
The Black Box Variational Inference (Ranganath et al. [2014]) algorithm provides a universal method for Variational Inference, but taking advantage of special properties of the approximation family or of the target can improve the convergence speed significantly. For example, if the approximation family is a transformation family, such as a Gaussian, then switching to the reparameterization gradient (Kingma and Welling [2014]) often yields a major reduction in gradient variance. Ultimately, reducing the variance can reduce the computational cost and yield better approximations. We present a new method to extend the reparameterization trick to more general exponential families including the Wishart, Gamma, and Student distributions. Variational Inference with Numerical Derivatives (VIND) approximates the gradient with numerical derivatives and reduces its variance using a tight coupling of the approximation family. The resulting algorithm is simple to implement and can profit from widely known couplings. Our experiments confirm that VIND effectively decreases the gradient variance and therefore improves the posterior approximation in relevant cases. It thus provides an efficient yet simple Variational Inference method for computing non-Gaussian approximations.
Approximate Inference Turns Deep Networks into Gaussian Processes
Khan, Mohammad Emtiyaz, Immer, Alexander, Abedi, Ehsan, Korzepa, Maciej
Deep neural networks (DNN) and Gaussian processes (GP) are two powerful models with several theoretical connections relating them, but the relationship between their training methods is not well understood. In this paper, we show that certain Gaussian posterior approximations for Bayesian DNNs are equivalent to GP posteriors. As a result, we can obtain a GP kernel and a nonlinear feature map simply by training the DNN. Surprisingly, the resulting kernel is the neural tangent kernel which has desirable theoretical properties for infinitely-wide DNNs. We show feature maps obtained on real datasets and demonstrate the use of the GP marginal likelihood to tune hyperparameters of DNNs. Our work aims to facilitate further research on combining DNNs and GPs in practical settings.
Generative Interest Estimation for Document Recommendations
Hafner, Danijar, Immer, Alexander, Raschkowski, Willi, Windheuser, Fabian
Learning distributed representations of documents has pushed the state-of-the-art in several natural language processing tasks and was successfully applied to the field of recommender systems recently. In this paper, we propose a novel content-based recommender system based on learned representations and a generative model of user interest. Our method works as follows: First, we learn representations on a corpus of text documents. Then, we capture a user's interest as a generative model in the space of the document representations. In particular, we model the distribution of interest for each user as a Gaussian mixture model (GMM). Recommendations can be obtained directly by sampling from a user's generative model. Using Latent semantic analysis (LSA) as comparison, we compute and explore document representations on the Delicious bookmarks dataset, a standard benchmark for recommender systems. We then perform density estimation in both spaces and show that learned representations outperform LSA in terms of predictive performance.