Exponential convergence of testing error for stochastic gradient methods
Pillaud-Vivien, Loucas, Rudi, Alessandro, Bach, Francis
Stochastic gradient methods are now ubiquitous in machine learning, both from the practical side, as a simple algorithm that can learn from a single or a few passes over the data [1], and from the theoretical side, as it leads to optimal rates for estimation problems in a variety of situations [2, 3]. They follow a simple principle [4]: to find a minimizer of a function F defined on a vector space from noisy gradients, simply follow the negative stochastic gradient and the algorithm will converge to a stationary point, local minimum, global minimum of F (depending on the properties of the function F), with a rate of convergence that decays with the number of gradient steps n typically as O(1/ n), or O(1/n) depending on the assumptions which are made on the problem (see, e.g., [3, 5, 6, 7, 8, 9, 10, 11]).
Dec-13-2017