Deep linear networks for regression are implicitly regularized towards flat minima Lénaïc Chizat Institute of Mathematics

May-31-2025, 06:04:43 GMT–Neural Information Processing Systems

The largest eigenvalue of the Hessian, or sharpness, of neural networks is a key quantity to understand their optimization dynamics. In this paper, we study the sharpness of deep linear networks for univariate regression. Minimizers can have arbitrarily large sharpness, but not an arbitrarily small one. Indeed, we show a lower bound on the sharpness of minimizers, which grows linearly with depth. We then study the properties of the minimizer found by gradient flow, which is the limit of gradient descent with vanishing learning rate.

artificial intelligence, machine learning, prod, (19 more...)

Neural Information Processing Systems

May-31-2025, 06:04:43 GMT

Conferences PDF

Add feedback

Country:
- Europe > Switzerland (0.14)
- North America > United States (0.14)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.67)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.46)
  - Statistical Learning > Regression (0.66)