Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima

Neural Information Processing Systems 

To address this gap, we study deterministic/stochastic versions of SAM with practical configurations (i.e., constant