Fast Mixing of Stochastic Gradient Descent with Normalization and Weight Decay

Open in new window