Deep linear networks for regression are implicitly regularized towards flat minima Lénaïc Chizat Institute of Mathematics
–Neural Information Processing Systems
The largest eigenvalue of the Hessian, or sharpness, of neural networks is a key quantity to understand their optimization dynamics. In this paper, we study the sharpness of deep linear networks for univariate regression. Minimizers can have arbitrarily large sharpness, but not an arbitrarily small one. Indeed, we show a lower bound on the sharpness of minimizers, which grows linearly with depth. We then study the properties of the minimizer found by gradient flow, which is the limit of gradient descent with vanishing learning rate.
Neural Information Processing Systems
May-31-2025, 06:04:43 GMT
- Country:
- Europe > Switzerland (0.14)
- North America > United States (0.14)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.67)
- Research Report
- Technology: