A Model Training Details

Neural Information Processing Systems 

The base learning rate for SGDM and IA is set to 0.01 for a batch size of 256, and linearly rescaled for the remaining batch sizes. For FA and Adam across all models, this base learning rate is 0.001.