Normalization Layers Are All That Sharpness-Aware Minimization Needs

Neural Information Processing Systems 

Sharpness-aware minimization (SAM) was proposed to reduce sharpness of minima and has been shown to enhance generalization performance in various settings. In this work we show that perturbing only the affine normalization parameters (typically comprising 0.1% of the total parameters) in the adversarial step of SAM can outperform perturbing all of the parameters.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found