Goto

Collaborating Authors

 adaptive newton method


Adaptive Newton Method for Empirical Risk Minimization to Statistical Accuracy

Neural Information Processing Systems

We consider empirical risk minimization for large-scale datasets. We introduce Ada Newton as an adaptive algorithm that uses Newton's method with adaptive sample sizes. The main idea of Ada Newton is to increase the size of the training set by a factor larger than one in a way that the minimization variable for the current training set is in the local neighborhood of the optimal argument of the next training set. This allows to exploit the quadratic convergence property of Newton's method and reach the statistical accuracy of each training set with only one iteration of Newton's method. We show theoretically that we can iteratively increase the sample size while applying single Newton iterations without line search and staying within the statistical accuracy of the regularized empirical risk. In particular, we can double the size of the training set in each iteration when the number of samples is sufficiently large. Numerical experiments on various datasets confirm the possibility of increasing the sample size by factor 2 at each iteration which implies that Ada Newton achieves the statistical accuracy of the full training set with about two passes over the dataset.


Reviews: Adaptive Newton Method for Empirical Risk Minimization to Statistical Accuracy

Neural Information Processing Systems

The paper contains a novel and interesting idea, and is well written. I think it should be accepted. This is one of a very few papers which attempt to combine optimization and statistical considerations. Most papers suggesting algorithms for solving the ERM problem simply focus on solving the deterministic ERM problem, and ignore the fact that ERM objective is an approximation to the true loss which arises as an expectation of the loss over an unknown sample distribution; and hence is subject to an approximation error. This is of course known in the literature, but it is notoriously difficult to address both the optimization aspect and the approximation aspect in a meaningful way in a single work.


Adaptive Newton Method for Empirical Risk Minimization to Statistical Accuracy

Mokhtari, Aryan, Daneshmand, Hadi, Lucchi, Aurelien, Hofmann, Thomas, Ribeiro, Alejandro

Neural Information Processing Systems

We consider empirical risk minimization for large-scale datasets. We introduce Ada Newton as an adaptive algorithm that uses Newton's method with adaptive sample sizes. The main idea of Ada Newton is to increase the size of the training set by a factor larger than one in a way that the minimization variable for the current training set is in the local neighborhood of the optimal argument of the next training set. This allows to exploit the quadratic convergence property of Newton's method and reach the statistical accuracy of each training set with only one iteration of Newton's method. We show theoretically that we can iteratively increase the sample size while applying single Newton iterations without line search and staying within the statistical accuracy of the regularized empirical risk.