ISAAC Newton: Input-based Approximate Curvature for Newton's Method
Petersen, Felix, Sutter, Tobias, Borgelt, Christian, Huh, Dongsung, Kuehne, Hilde, Sun, Yuekai, Deussen, Oliver
–arXiv.org Artificial Intelligence
We present ISAAC (Input-baSed ApproximAte Curvature), a novel method that conditions the gradient using selected second-order information and has an asymptotically vanishing computational overhead, assuming a batch size smaller than the number of neurons. We show that it is possible to compute a good conditioner based on only the input to a respective layer without a substantial computational overhead. The proposed method allows effective training even in small-batch stochastic regimes, which makes it competitive to first-order as well as second-order methods. While second-order optimization methods are traditionally much less explored than first-order methods in large-scale machine learning (ML) applications due to their memory requirements and prohibitive computational cost per iteration, they have recently become more popular in ML mainly due to their fast convergence properties when compared to first-order methods [1]. The expensive computation of an inverse Hessian (also known as pre-conditioning matrix) in the Newton step has also been tackled via estimating the curvature from the change in gradients.
arXiv.org Artificial Intelligence
Apr-30-2023