Adaptive Variance Reduction for Stochastic Optimization under Weaker Assumptions Wei Jiang 1, Sifan Y ang

Neural Information Processing Systems 

Problem (1) has been comprehensively investigated in the literature [Duchi et al., 2011, Kingma and Ba, 2015, Loshchilov and Hutter, 2017], and it is well-known that the classical stochastic gradient descent (SGD) achieves a convergence rate of

Similar Docs  Excel Report  more

TitleSimilaritySource
None found