Reviews: Variance Reduction in Stochastic Gradient Langevin Dynamics

Jan-20-2025, 16:25:56 GMT–Neural Information Processing Systems

I have one key concern, which may be a misunderstanding on my part as I did not check the supplementary section in detail. The update at each time step is computed using gradients of different parameter values (theta), some of which were generated arbitrarily many time steps ago. This dependence on previous samples means that the SAGA-LD chain is not Markov. The proofs seem to be based on a result for SG-MCMC chains, but I am not sure if the result easily applies to SAGA-LD because of the violation of the Markov property. Other than the above point, I think this is a very useful line of work.

stochastic gradient langevin dynamic, variance, variance reduction, (9 more...)

Neural Information Processing Systems

Jan-20-2025, 16:25:56 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.40)