Sub-sampled Newton Methods with Non-uniform Sampling

Xu, Peng, Yang, Jiyan, Roosta, Fred, Ré, Christopher, Mahoney, Michael W.

Neural Information Processing Systems 

We consider the problem of finding the minimizer of a convex function $F: \mathbb R d \rightarrow \mathbb R$ of the form $F(w) \defeq \sum_{i 1} n f_i(w) R(w)$ where a low-rank factorization of $ abla 2 f_i(w)$ is readily available.We consider the regime where $n \gg d$. We propose randomized Newton-type algorithms that exploit \textit{non-uniform} sub-sampling of $\{ abla 2 f_i(w)\}_{i 1} {n}$, as well as inexact updates, as means to reduce the computational complexity, and are applicable to a wide range of problems in machine learning. Two non-uniform sampling distributions based on {\it block norm squares} and {\it block partial leverage scores} are considered. Under certain assumptions, we show that our algorithms inherit a linear-quadratic convergence rate in $w$ and achieve a lower computational complexity compared to similar existing methods. In addition, we show that our algorithms exhibit more robustness and better dependence on problem specific quantities, such as the condition number.