Reviews: Train longer, generalize better: closing the generalization gap in large batch training of neural networks

Oct-8-2024, 05:41:51 GMT–Neural Information Processing Systems

I think the paper provides some clarity on a topic that has seen a bit of attention lately, namely that of the role of noise in optimization and in particular the hypothesis of sharp minima/flat minima. From this perspective I think this datapoint is important for our collective understanding of training deep networks. I don't think the observation made by the authors come as a surprise to anyone with experience with these models, however the final conclusion might. We know that when having large minibatches we have lower variance and hence we should use a larger learning rate, etc. I think one practical issue that people have got stuck in the past is that with larger minibatches the computational cost of any given gradient increases.

generalization gap, minibatch, minima, (6 more...)

Neural Information Processing Systems

Oct-8-2024, 05:41:51 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.51)