Statistical Learning
Understanding the Role of Momentum in Stochastic Gradient Methods
Igor Gitman, Hunter Lang, Pengchuan Zhang, Lin Xiao
Different variants ofmomentum, including heavyball momentum, Nesterov's accelerated gradient (NAG), and quasi-hyperbolic momentum (QHM), havedemonstrated success onvarious tasks. Our results are most closely related to the work of Mandt et al.[19]who use stationaryanalysis of SGD with momentum to perform approximateBayesianinference.