Averaging Weights Leads to Wider Optima and Better Generalization

Open in new window