Motivated by prior work connecting the geometry of the loss landscape and generalization, we introduce a novel, effective procedure for instead simulta- neously minimizing loss value and loss sharpness. In particular, our procedure, Sharpness-Aware Minimization (SAM), seeks parameters that lie in neighbor- hoods having uniformly low loss; this formulation results in a min-max optimiza- tion problem on which gradient descent can be performed efficiently. We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets In Deep Learning we use optimization algorithms such as SGD/Adam to achieve convergence in our model, which leads to finding the global minima, i.e a point where the loss of the training dataset is low. But several kinds of research such as Zhang et al have shown, many networks can easily memorize the training data and have the capacity to readily overfit, To prevent this problem and add more generalization, Researchers at Google have published a new paper called Sharpness Awareness Minimization which provides State of the Art results on CIFAR10 and other datasets. In this article, we will look at why SAM can achieve better generalization and how we can implement SAM in Pytorch.
Mar-3-2021, 17:55:24 GMT