On regularization of gradient descent, layer imbalance and flat minima

Open in new window