Leveraging Continuous Time to Understand Momentum When Training Diagonal Linear Networks

Open in new window