Averaging Weights Leads to Wider Optima and Better Generalization