Reviews: Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

Neural Information Processing Systems 

The paper combines ES and SGD in a complementary way, not viewing as an alternative to each other. More intuitively, in the course of optimization, the author proposes to use evolutionary strategy to make optimiser adjust at different sections (geometry) of optimisation path. Since the geometry of the loss surface can differ drastically at different locations, it is reasonable to use different gradient based optimisers with different hyper parameters at different locations. Then the main question becomes how to choose a proper optimiser at different locations. The paper proposes ES for that.