Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence

He, Fengxiang, Liu, Tongliang, Tao, Dacheng

Mar-18-2020, 20:47:07 GMT–Neural Information Processing Systems

Deep neural networks have received dramatic success based on the optimization method of stochastic gradient descent (SGD). However, it is still not clear how to tune hyper-parameters, especially batch size and learning rate, to ensure good generalization. This paper reports both theoretical and empirical evidence of a training strategy that we should control the ratio of batch size to learning rate not too large to achieve a good generalization ability. Specifically, we prove a PAC-Bayes generalization bound for neural networks trained by SGD, which has a positive correlation with the ratio of batch size to learning rate. This correlation builds the theoretical foundation of the training strategy. Furthermore, we conduct a large-scale experiment to verify the correlation and training strategy.

batch size and learning rate, theoretical and empirical evidence, training strategy, (5 more...)

Neural Information Processing Systems

Mar-18-2020, 20:47:07 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (0.88)
  - Statistical Learning > Gradient Descent (0.63)