On the Heavy-Tailed Theory of Stochastic Gradient Descent for Deep Neural Networks

Open in new window