Least-squares regressions via randomized Hessians

Kahale, Nabil

arXiv.org Machine Learning 

The recent availability of massive volumes of data fosters the need to design computationally efficient algorithms for optimization in high dimensions. In large-scale machine learning, stochastic gradient descent algorithms are among the most effective optimization methods (Bottou, Curtis and Nocedal 2018). For general smooth convex functions, averaged SGD achieves the rate of convergence of O(1/ k) after k iterations (Nemirovski, Juditsky, Lan and Shapiro 2009). For strongly-convex functions, i.e. when the smallest eigenvalue of the Hessian matrix is bounded away from 0, the convergence rate after k iterations is O(1/k) (Nemirovski, Juditsky, Lan and Shapiro 2009). Variance-reduced SGD algorithms that optimize the sum of n convex functions are described in (Schmidt, Le Roux and Bach 2017, Shalev-Shwartz and Zhang 2013, Johnson and Zhang 2013), and related accelerated methods are analysed in (Shalev-Shwartz and Zhang 2014, Nitanda 2014, Lan and Zhou 2018, Scieur, dAspremont and Bach 2018). These methods enjoy linear convergence(a convergence rate that decreases exponentially with the number of iterations) in the strongly-convex case. For general smooth convex functions, the stochastic average gradient method (SAG) of Schmidt, Le Roux and Bach (2017) yields a convergence rate of O( n/k) after k iterations. This paper focuses on the least-squares regression, which often arises in scientific computing and data analysis, and is widely used for inference and prediction. Many of the modern machine learning techniques such as the logistic and ridge regressions, the lasso method and neural networks can be considered as extensions of the least-squares regression technique.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found