stochastic variational inference
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > South Yorkshire > Sheffield (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > South Yorkshire > Sheffield (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
- Asia > Middle East > Jordan (0.04)
07cdfd23373b17c6b337251c22b7ea57-Reviews.html
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper proposes parsimonious triangular model (PTM), which constrains the O(K^3) parameter space of mixed-membership triangular model (MMTM) to O(K) for faster inference. Authors develop a stochastic variational inference algorithm for PTM and additional approximation tricks to make it further scalable. It is shown from synthetic dataset that the reduction of the number of variables may lead to stronger statistical power, and from real-world datasets that the proposed method is competitive with existing methods in terms of accuracy. Quality: PTM seems to be an interesting specialization of MMTM, but it is questionable what is the practical advantage of achieving good scalability in terms of K (the number of possible roles). To empirically evaluate the value of such a method, it is critical for us to answer how does it help if we can learn MMTM with large K? Since MMSB and MMTM are mixed-membership models, using small K may not be as troublesome as it is in single-membership models!
- North America > United States > Nevada (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Stochastic variational inference (SVI) requires careful selection of a step size. This paper proposes a Kalman filter to set the step size automatically. The authors show that standard Gaussian KF does not satisfy the Robbins Munro criteria (and performs badly). They propose to apply a KF based on T-distributions, and show that this gives better results than standard SVI.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Summary The paper introduces a simple strategy to reduce the variance of gradients in stochastic variational inference methods. Variance reduction is achieved by storing the last L data-point's contribution to the approximated/stochastic gradient and averaging these values. There exists a bias variance trade off: variance reduction comes at the cost of increased bias in the gradient estimates. The bias-variance tradeoff can be controlled by varying the sliding window size L. Also this strategy requires storing the last L data-point gradient contributions which can be significant.
Stochastic variational inference for hidden Markov models
Nick Foti, Jason Xu, Dillon Laird, Emily Fox
V ariational inference algorithms have proven successful for Bayesian analysis in large data settings, with recent advances using stochastic variational inference (SVI). However, such methods have largely been studied in independent or exchangeable data settings. We develop an SVI algorithm to learn the parameters of hidden Markov models (HMMs) in a time-dependent data setting. The challenge in applying stochastic optimization in this setting arises from dependencies in the chain, which must be broken to consider minibatches of observations. We propose an algorithm that harnesses the memory decay of the chain to adaptively bound errors arising from edge effects. We demonstrate the effectiveness of our algorithm on synthetic experiments and a large genomics dataset where a batch algorithm is computationally infeasible.
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
- Health & Medicine > Therapeutic Area > Oncology (0.46)
A Filtering Approach to Stochastic Variational Inference
Stochastic variational inference (SVI) uses stochastic optimization to scale up Bayesian computation to massive data. We present an alternative perspective on SVI as approximate parallel coordinate ascent. SVI trades-off bias and variance to step close to the unknown true coordinate optimum given by batch variational Bayes (VB). We define a model to automate this process.
Smoothed Gradients for Stochastic Variational Inference
Stochastic variational inference (SVI) lets us scale up Bayesian computation to massive data. It uses stochastic optimization to fit a variational distribution, following easy-to-compute noisy natural gradients. As with most traditional stochastic optimization methods, SVI takes precautions to use unbiased stochastic gradients whose expectations are equal to the true gradients. In this paper, we explore the idea of following biased stochastic gradients in SVI. Our method replaces the natural gradient with a similarly constructed vector that uses a fixed-window moving average of some of its previous terms. We will demonstrate the many advantages of this technique. First, its computational cost is the same as for SVI and storage requirements only multiply by a constant factor. Second, it enjoys significant variance reduction over the unbiased estimates, smaller bias than averaged gradients, and leads to smaller mean-squared error against the full gradient. We test our method on latent Dirichlet allocation with three large corpora.
Stochastic variational inference for hidden Markov models
Variational inference algorithms have proven successful for Bayesian analysis in large data settings, with recent advances using stochastic variational inference (SVI). However, such methods have largely been studied in independent or exchangeable data settings. We develop an SVI algorithm to learn the parameters of hidden Markov models (HMMs) in a time-dependent data setting. The challenge in applying stochastic optimization in this setting arises from dependencies in the chain, which must be broken to consider minibatches of observations. We propose an algorithm that harnesses the memory decay of the chain to adaptively bound errors arising from edge effects. We demonstrate the effectiveness of our algorithm on synthetic experiments and a large genomics dataset where a batch algorithm is computationally infeasible.