Train longer, generalize better: closing the generalization gap in large batch training of neural networks

Oct-6-2024, 23:45:02 GMT–Neural Information Processing Systems

Background: Deep learning models are typically trained using stochastic gradient descent or one of its variants. These methods update the weights using their gradient, estimated from a small fraction of the training data. It has been observed that when using large batch sizes there is a persistent degradation in generalization performance - known as the "generalization gap" phenomenon. Identifying the origin of this gap and closing it had remained an open problem. Contributions: We examine the initial high learning rate training phase.

artificial intelligence, batch size, machine learning, (15 more...)

Neural Information Processing Systems

Oct-6-2024, 23:45:02 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report > New Finding (0.94)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (1.00)
  - Statistical Learning > Gradient Descent (0.70)

Duplicate Docs Excel Report

Title
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
Train longer, generalize better: closing the generalization gap in large batch training of neural networks

Similar Docs Excel Report more

Title	Similarity	Source
None found