Goto

Collaborating Authors

 stochastic gradient langevin dynamic algorithm


A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

Neural Information Processing Systems

We propose an adaptively weighted stochastic gradient Langevin dynamics algorithm (SGLD), so-called contour stochastic gradient Langevin dynamics (CSGLD), for Bayesian learning in big data statistics. The proposed algorithm is essentially a scalable dynamic importance sampler, which automatically flattens the target distribution such that the simulation for a multi-modal distribution can be greatly facilitated. Theoretically, we prove a stability condition and establish the asymptotic convergence of the self-adapting parameter to a unique fixed-point, regardless of the non-convexity of the original energy function; we also present an error analysis for the weighted averaging estimators. Empirically, the CSGLD algorithm is tested on multiple benchmark datasets including CIFAR10 and CIFAR100. The numerical results indicate its superiority over the existing state-of-the-art algorithms in training deep neural networks.


Review for NeurIPS paper: A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

Neural Information Processing Systems

My main concern is that using a flattened surrogate energy in this fashion is suitable for most sampling situations. The main reason is, by construction our iterates are not following the true distribution particularly closely; for example a plot of the samples obtained in the synthetic experiments (figs 2c--d) would look quite different from the original. While this does allow the algorithm to bounce out of local optima, the deviance from the true energy would make samples obtained after convergence to not be super useful. For point estimation situations, we might be able to get away with these samples for cases where the multiple modes of the real energy are sort of symmetric (as in the synthetic Gaussian experiments); it seems that even if we use a'flattened' energy (can be thought of as lower peaks with higher elevation between them), the original distribution's symmetry would be essentially preserved and the mean / other point estimates would be close enough. But flattening energies with skewed distribution of modes might not be as accurate, as the flattened version might have a mean closer to the'center' of the space, but the original would be closer to one of the modes near the periphery (am visualizing a simple 2-d space).


Review for NeurIPS paper: A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

Neural Information Processing Systems

The paper presents valuable theoretical and empirical evidence for a novel algorithm. The AC is confident this represents valuable work but was a bit torn about the acceptance decision, as the reviewers point out several important avenues where improvement in the paper is needed.


A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

Neural Information Processing Systems

We propose an adaptively weighted stochastic gradient Langevin dynamics algorithm (SGLD), so-called contour stochastic gradient Langevin dynamics (CSGLD), for Bayesian learning in big data statistics. The proposed algorithm is essentially a scalable dynamic importance sampler, which automatically flattens the target distribution such that the simulation for a multi-modal distribution can be greatly facilitated. Theoretically, we prove a stability condition and establish the asymptotic convergence of the self-adapting parameter to a unique fixed-point, regardless of the non-convexity of the original energy function; we also present an error analysis for the weighted averaging estimators. Empirically, the CSGLD algorithm is tested on multiple benchmark datasets including CIFAR10 and CIFAR100. The numerical results indicate its superiority over the existing state-of-the-art algorithms in training deep neural networks.


Extended Stochastic Gradient MCMC for Large-Scale Bayesian Variable Selection

Song, Qifan, Sun, Yan, Ye, Mao, Liang, Faming

arXiv.org Machine Learning

Stochastic gradient Markov chain Monte Carlo (MCMC) algorithms have received much attention in Bayesian computing for big data problems, but they are only applicable to a small class of problems for which the parameter space has a fixed dimension and the log-posterior density is differentiable with respect to the parameters. This paper proposes an extended stochastic gradient MCMC lgoriathm which, by introducing appropriate latent variables, can be applied to more general large-scale Bayesian computing problems, such as those involving dimension jumping and missing data. Numerical studies show that the proposed algorithm is highly scalable and much more efficient than traditional MCMC algorithms. The proposed algorithms have much alleviated the pain of Bayesian methods in big data computing.