Exploring Landscapes for Better Minima along Valleys

Neural Information Processing Systems 

However, most existing optimizers stop searching the parameter space once they reach a local minimum. Given the complex geometric properties of the loss landscape, it is difficult to guarantee that such a point is the lowest or provides the best generalization. To address this, we propose an adaptor "E" for gradient-based optimizers. The adapted optimizer tends to continue exploring along landscape 5.0 valleys (areas with low and nearly identical losses) in order to search for potentially1.0