Train longer, generalize better: closing the generalization gap in large batch training of neural networks
Elad Hoffer, Itay Hubara, Daniel Soudry
–Neural Information Processing Systems
Background: Deep learning models are typically trained using stochastic gradient descent or one of its variants. These methods update the weights using their gradient, estimated from a small fraction of the training data. It has been observed that when using large batch sizes there is a persistent degradation in generalization performance - known as the "generalization gap" phenomenon. Identifying the origin of this gap and closing it had remained an open problem. Contributions: We examine the initial high learning rate training phase.
Neural Information Processing Systems
Oct-6-2024, 23:45:02 GMT
- Country:
- North America > United States
- California > Los Angeles County > Long Beach (0.04)
- Asia > Middle East
- Israel > Haifa District > Haifa (0.04)
- North America > United States
- Genre:
- Research Report > New Finding (0.94)
- Technology: