Export Reviews, Discussions, Author Feedback and Meta-Reviews
–Neural Information Processing Systems
This paper proposes a new adaptive learning rate scheme for optimizing nonlinear objective functions that arise during the training of deep neural networks. The main argument is based on recent results that indicate that the difficulty of the optimization stems from the presence of saddle points rather than local minima in the optimization path. The saddle points slow down training since the objective function tends to be flat in many directions and ill-conditioned in the neighbourhood of the saddle points. The authors propose a new method for reducing the ill-conditioning (the problem of pathological curvature) by "preconditioning" the objective function through a linear change of variables, which reduces to left-multiplying the gradient descent update step with a learned preconditioning matrix D. They focus specifically on the case where D is diagonal, and they show how a diagonal D reduces to methods for learning parameter-specific learning rates, such as the well-known Jacobi preconditioner or RMSProp. This is a nice framework within which to consider different schemes for adaptive learning rates.
Neural Information Processing Systems
Feb-6-2025, 22:45:54 GMT
- Technology: