Export Reviews, Discussions, Author Feedback and Meta-Reviews

Feb-6-2025, 22:45:54 GMT–Neural Information Processing Systems

This paper proposes a new adaptive learning rate scheme for optimizing nonlinear objective functions that arise during the training of deep neural networks. The main argument is based on recent results that indicate that the difficulty of the optimization stems from the presence of saddle points rather than local minima in the optimization path. The saddle points slow down training since the objective function tends to be flat in many directions and ill-conditioned in the neighbourhood of the saddle points. The authors propose a new method for reducing the ill-conditioning (the problem of pathological curvature) by "preconditioning" the objective function through a linear change of variables, which reduces to left-multiplying the gradient descent update step with a learned preconditioning matrix D. They focus specifically on the case where D is diagonal, and they show how a diagonal D reduces to methods for learning parameter-specific learning rates, such as the well-known Jacobi preconditioner or RMSProp. This is a nice framework within which to consider different schemes for adaptive learning rates.

author feedback and meta-review, contribution, objective function, (12 more...)

Neural Information Processing Systems

Feb-6-2025, 22:45:54 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (0.72)
  - Statistical Learning > Gradient Descent (0.38)