Balancing Gradient and Hessian Queries in Non-Convex Optimization

Neural Information Processing Systems 

We develop optimization methods which offer new trade-offs between the number of gradient and Hessian computations needed to compute the critical point of a nonconvex function. We provide a method that for a twice-differentiable f: Rd R with L2-Lipschitz Hessian, an input initial point with -bounded sub-optimality, and a sufficiently small ϵ > 0, outputs an ϵ-critical point, i.e., a point xsuch that