Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks

Open in new window