Reviews: Step Size Matters in Deep Learning

Oct-8-2024, 08:36:06 GMT–Neural Information Processing Systems

This paper proves several results related to the step size (learning rate) in gradient Descent (GD). First, the paper examines linear feedforward neural nets with a quadratic loss, trained on whitened data with outputs which are a PSD matrix times the input. It is found that GD can only converge to a critical point if the learning rate it is below a certain threshold. Then, it is proven that the weights converge at a linear rate to the global minimum if they are initialized to identity, and the step size is below another threshold. The latter result is also extended to matrices with negative eigenvalues.

converge, initialization, step size, (10 more...)

Neural Information Processing Systems

Oct-8-2024, 08:36:06 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)