Appendix: On the Overlooked Structure of Stochastic Gradients

Neural Information Processing Systems 

Avila is a non-image dataset. A.3 Image classification on MNIST We perform the common per-pixel zero-mean unit-variance normalization as data preprocessing for MNIST. Pretraining Hyperparameter Settings: We train neural networks for 50 epochs on MNIST for obtaining pretrained models. The batch size is set to 1 and no weight decay is used, unless we specify them otherwise. As for other optimizer hyperparameters, we apply the default settings directly.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found