Stochastic optimization and sparse statistical recovery: An optimal algorithm for high dimensions

Agarwal, Alekh, Negahban, Sahand, Wainwright, Martin J.

arXiv.org Machine Learning 

Stochastic optimization algorithms have many desirable features for large-scale machine learning, and accordingly have been the focus of renewed and intensive study in the last several years (e.g., see the papers [26, 4, 10, 30] and references therein). The empirical efficiency of these methods is backed with strong theoretical guarantees, providing sharp bounds on their convergence rates. These convergence rates are known to depend on the structure of the underlying objective function, with faster rates being possible for objective functions that are smooth and/or (strongly) convex, or optima that have desirable features such as sparsity. More precisely, for an objective function that is strongly convex, stochastic gradient descent enjoys a convergence rate ranging from O(1/T), when features vectors are extremely sparse, to O(d/T) when feature vectors are dense [11, 19, 12]. Such results are of significant interest, because the strong convexity condition is satisfied for many common machine learning problems, including boosting, least squares regression, support vector machines and generalized linear models, among other examples. A complementary type of condition is that of sparsity, either exact or approximate, in the optimal solution.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found