Training Diagonal Linear Networks with Stochastic Sharpness-Aware Minimization

Open in new window