Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks

Open in new window