On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent

Open in new window