Parameter Averaging for SGD Stabilizes the Implicit Bias towards Flat Regions

Open in new window