Stochastic Gradient Descent in the Saddle-to-Saddle Regime of Deep Linear Networks

Open in new window