stochastic spectral
Stochastic Spectral and Conjugate Descent Methods
The state-of-the-art methods for solving optimization problems in big dimensions are variants of randomized coordinate descent (RCD). In this paper we introduce a fundamentally new type of acceleration strategy for RCD based on the augmentation of the set of coordinate directions by a few spectral or conjugate directions. As we increase the number of extra directions to be sampled from, the rate of the method improves, and interpolates between the linear rate of RCD and a linear rate independent of the condition number. We develop and analyze also inexact variants of these methods where the spectral and conjugate directions are allowed to be approximate only. We motivate the above development by proving several negative results which highlight the limitations of RCD with importance sampling.
Reviews: Stochastic Spectral and Conjugate Descent Methods
This paper focuses on quadratic minimization and proposes a novel stochastic gradient descent algorithm by interpolating randomized coordinate descent (RCD) and stochastic spectral descent (SSD). Those two algorithms were studied in prior but they have drawbacks to apply for practical usage. Hence, authors combine them by random coin-tossing (i.e., with probability \alpha run RCD and with probability 1-\alpha run SSD) and provide the optimal parameters to achieve the best convergence rate. In addition, they study the rate of their algorithm for some special cases, e.g., a matrix in objective has clustered or exponentially decaying eigenvalues. In experiments, they report practical behavior of their algorithm under synthetic settings. The proposed algorithm takes advantages of two prior works, i.e., efficient gradient update from RCD and fast convergence from SSD.