Saddle-to-Saddle Dynamics in Diagonal Linear Networks

Oct-10-2024, 01:35:56 GMT–Neural Information Processing Systems

In this paper we fully describe the trajectory of gradient flow over 2 -layer diagonal linear networks for the regression setting in the limit of vanishing initialisation. We show that the limiting flow successively jumps from a saddle of the training loss to another until reaching the minimum \ell_1 -norm solution. We explicitly characterise the visited saddles as well as the jump times through a recursive algorithm reminiscent of the LARS algorithm used for computing the Lasso path. Starting from the zero vector, coordinates are successively activated until the minimum \ell_1 -norm solution is recovered, revealing an incremental learning. Our proof leverages a convenient arc-length time-reparametrisation which enables to keep track of the transitions between the jumps.

diagonal linear network, saddle-to-saddle dynamic

Neural Information Processing Systems

Oct-10-2024, 01:35:56 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.45)