sg-mcmc algorithm
On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators
Changyou Chen, Nan Ding, Lawrence Carin
Recent advances in Bayesian learning with large-scale data have witnessed emergence of stochastic gradient MCMC algorithms (SG-MCMC), such as stochastic gradient Langevin dynamics (SGLD), stochastic gradient Hamiltonian MCMC (SGHMC), and the stochastic gradient thermostat. While finite-time convergence properties of the SGLD with a 1st-order Euler integrator have recently been studied, corresponding theory for general SG-MCMCs has not been explored. In this paper we consider general SG-MCMCs with high-order integrators, and develop theory to analyze finite-time convergence properties and their asymptotic invariant measures. Our theoretical results show faster convergence rates and more accurate invariant measures for SG-MCMCs with higher-order integrators.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- Asia > Middle East > Jordan (0.04)
Reviews: Stochastic Gradient Richardson-Romberg Markov Chain Monte Carlo
I like the fact that the method is very simple to understand and implement (see my summary), and does not require any major changes to the base SG-MCMC algorithm. Also, this seems very general and applies to a large class of SG-MCMC algorithms, and is therefore potentially very impactful to the Stochastic Gradient MCMC community. Novelty: Although Richardson-Romberg extrapolation is well known in numerical analysis, it is not widely known in the machine learning / stochastic gradient MCMC community. Clarity: The paper is well written and the presentation is clear. Comments/questions: - Can this technique be directly applied to all SG-MCMC methods?
On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators Changyou Chen Nan Ding
Recent advances in Bayesian learning with large-scale data have witnessed emergence of stochastic gradient MCMC algorithms (SG-MCMC), such as stochastic gradient Langevin dynamics (SGLD), stochastic gradient Hamiltonian MCMC (SGHMC), and the stochastic gradient thermostat. While finite-time convergence properties of the SGLD with a 1st-order Euler integrator have recently been studied, corresponding theory for general SG-MCMCs has not been explored. In this paper we consider general SG-MCMCs with high-order integrators, and develop theory to analyze finite-time convergence properties and their asymptotic invariant measures. Our theoretical results show faster convergence rates and more accurate invariant measures for SG-MCMCs with higher-order integrators.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > New York (0.04)
- Asia > Middle East > Jordan (0.04)
Scaling up Dynamic Edge Partition Models via Stochastic Gradient MCMC
The edge partition model (EPM) is a generative model for extracting an overlapping community structure from static graph-structured data. In the EPM, the gamma process (GaP) prior is adopted to infer the appropriate number of latent communities, and each vertex is endowed with a gamma distributed positive memberships vector. Despite having many attractive properties, inference in the EPM is typically performed using Markov chain Monte Carlo (MCMC) methods that prevent it from being applied to massive network data. In this paper, we generalize the EPM to account for dynamic enviroment by representing each vertex with a positive memberships vector constructed using Dirichlet prior specification, and capturing the time-evolving behaviour of vertices via a Dirichlet Markov chain construction. A simple-to-implement Gibbs sampler is proposed to perform posterior computation using Negative- Binomial augmentation technique. For large network data, we propose a stochastic gradient Markov chain Monte Carlo (SG-MCMC) algorithm for scalable inference in the proposed model. The experimental results show that the novel methods achieve competitive performance in terms of link prediction, while being much faster.
- Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (2 more...)
- Research Report > Promising Solution (0.34)
- Research Report > New Finding (0.34)
On Connecting Stochastic Gradient MCMC and Differential Privacy
Li, Bai, Chen, Changyou, Liu, Hao, Carin, Lawrence
Significant success has been realized recently on applying machine learning to real-world applications. There have also been corresponding concerns on the privacy of training data, which relates to data security and confidentiality issues. Differential privacy provides a principled and rigorous privacy guarantee on machine learning models. While it is common to design a model satisfying a required differential-privacy property by injecting noise, it is generally hard to balance the trade-off between privacy and utility. We show that stochastic gradient Markov chain Monte Carlo (SG-MCMC) -- a class of scalable Bayesian posterior sampling algorithms proposed recently -- satisfies strong differential privacy with carefully chosen step sizes. We develop theory on the performance of the proposed differentially-private SG-MCMC method. We conduct experiments to support our analysis and show that a standard SG-MCMC sampler without any modification (under a default setting) can reach state-of-the-art performance in terms of both privacy and utility on Bayesian learning.
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.86)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
Particle Optimization in Stochastic Gradient MCMC
Stochastic gradient Markov chain Monte Carlo (SG-MCMC) has been increasingly popular in Bayesian learning due to its ability to deal with large data. A standard SG-MCMC algorithm simulates samples from a discretized-time Markov chain to approximate a target distribution. However, the samples are typically highly correlated due to the sequential generation process, an undesired property in SG-MCMC. In contrary, Stein variational gradient descent (SVGD) directly optimizes a set of particles, and it is able to approximate a target distribution with much fewer samples. In this paper, we propose a novel method to directly optimize particles (or samples) in SG-MCMC from scratch. Specifically, we propose efficient methods to solve the corresponding Fokker-Planck equation on the space of probability distributions, whose solution (i.e., a distribution) is approximated by particles. Through our framework, we are able to show connections of SG-MCMC to SVGD, as well as the seemly unrelated generative-adversarial-net framework. Under certain relaxations, particle optimization in SG-MCMC can be interpreted as an extension of standard SVGD with momentum.
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > New York > Erie County > Buffalo (0.04)
- Europe > Russia (0.04)
- (2 more...)
A Convergence Analysis for A Class of Practical Variance-Reduction Stochastic Gradient MCMC
Chen, Changyou, Wang, Wenlin, Zhang, Yizhe, Su, Qinliang, Carin, Lawrence
Stochastic gradient Markov Chain Monte Carlo (SG-MCMC) has been developed as a flexible family of scalable Bayesian sampling algorithms. However, there has been little theoretical analysis of the impact of minibatch size to the algorithm's convergence rate. In this paper, we prove that under a limited computational budget/time, a larger minibatch size leads to a faster decrease of the mean squared error bound (thus the fastest one corresponds to using full gradients), which motivates the necessity of variance reduction in SG-MCMC. Consequently, by borrowing ideas from stochastic optimization, we propose a practical variance-reduction technique for SG-MCMC, that is efficient in both computation and storage. We develop theory to prove that our algorithm induces a faster convergence rate than standard SG-MCMC. A number of large-scale experiments, ranging from Bayesian learning of logistic regression to deep neural networks, validate the theory and demonstrate the superiority of the proposed variance-reduction SG-MCMC framework.
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)
Stochastic Gradient MCMC Methods for Hidden Markov Models
Ma, Yi-An, Foti, Nicholas J., Fox, Emily B.
Stochastic gradient MCMC (SG-MCMC) algorithms have proven useful in scaling Bayesian inference to large datasets under an assumption of i.i.d data. We instead develop an SG-MCMC algorithm to learn the parameters of hidden Markov models (HMMs) for time-dependent data. There are two challenges to applying SG-MCMC in this setting: The latent discrete states, and needing to break dependencies when considering minibatches. We consider a marginal likelihood representation of the HMM and propose an algorithm that harnesses the inherent memory decay of the process. We demonstrate the effectiveness of our algorithm on synthetic experiments and an ion channel recording data, with runtimes significantly outperforming batch MCMC.
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
On the Convergence of Stochastic Gradient MCMC Algorithms with High-Order Integrators
Chen, Changyou, Ding, Nan, Carin, Lawrence
Recent advances in Bayesian learning with large-scale data have witnessed emergence of stochastic gradient MCMC algorithms (SG-MCMC), such as stochastic gradient Langevin dynamics (SGLD), stochastic gradient Hamiltonian MCMC (SGHMC), and the stochastic gradient thermostat. While finite-time convergence properties of the SGLD with a 1st-order Euler integrator have recently been studied, corresponding theory for general SG-MCMCs has not been explored. In this paper we consider general SG-MCMCs with high-order integrators, and develop theory to analyze finite-time convergence properties and their asymptotic invariant measures. Our theoretical results show faster convergence rates and more accurate invariant measures for SG-MCMCs with higher-order integrators. For example, with the proposed efficient 2nd-order symmetric splitting integrator, the mean square error (MSE) of the posterior average for the SGHMC achieves an optimal convergence rate of $L^{-4/5}$ at $L$ iterations, compared to $L^{-2/3}$ for the SGHMC and SGLD with 1st-order Euler integrators. Furthermore, convergence results of decreasing-step-size SG-MCMCs are also developed, with the same convergence rates as their fixed-step-size counterparts for a specific decreasing sequence. Experiments on both synthetic and real datasets verify our theory, and show advantages of the proposed method in two large-scale real applications.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > United States > New York (0.04)
- Asia > Middle East > Jordan (0.04)
High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models
Li, Chunyuan, Chen, Changyou, Fan, Kai, Carin, Lawrence
Learning in deep models using Bayesian methods has generated significant attention recently. This is largely because of the feasibility of modern Bayesian methods to yield scalable learning and inference, while maintaining a measure of uncertainty in the model parameters. Stochastic gradient MCMC algorithms (SG-MCMC) are a family of diffusion-based sampling methods for large-scale Bayesian learning. In SG-MCMC, multivariate stochastic gradient thermostats (mSGNHT) augment each parameter of interest, with a momentum and a thermostat variable to maintain stationary distributions as target posterior distributions. As the number of variables in a continuous-time diffusion increases, its numerical approximation error becomes a practical bottleneck, so better use of a numerical integrator is desirable. To this end, we propose use of an efficient symmetric splitting integrator in mSGNHT, instead of the traditional Euler integrator. We demonstrate that the proposed scheme is more accurate, robust, and converges faster. These properties are demonstrated to be desirable in Bayesian deep learning. Extensive experiments on two canonical models and their deep extensions demonstrate that the proposed scheme improves general Bayesian posterior sampling, particularly for deep models.
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)