Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent

Open in new window