Reviews: Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks

Oct-8-2024, 21:06:19 GMT–Neural Information Processing Systems

The paper combines ES and SGD in a complementary way, not viewing as an alternative to each other. More intuitively, in the course of optimization, the author proposes to use evolutionary strategy to make optimiser adjust at different sections (geometry) of optimisation path. Since the geometry of the loss surface can differ drastically at different locations, it is reasonable to use different gradient based optimisers with different hyper parameters at different locations. Then the main question becomes how to choose a proper optimiser at different locations. The paper proposes ES for that.

esgd hyperparameter, hyperparameter, optimizer, (12 more...)

Neural Information Processing Systems

Oct-8-2024, 21:06:19 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (0.90)
  - Statistical Learning > Gradient Descent (0.85)