Gradient descent aligns the layers of deep linear networks

Open in new window