neural network optimizer
A (Quick) Guide to Neural Network Optimizers with Applications in Keras
SGD performs frequent updates with a high variance, causing the objective function to fluctuate heavily. SGD's fluctuation enables it to jump from a local minima to a potentially better local minima, but complicates convergence to an exact minimum. Momentum is a parameter of SGD that can be added to assist SGD in ravines -- areas where the surface curves more steeply in one dimension than in another, common around optima. Momentum helps accelerate SGD in the correct direction, therefore dampening the redundant oscillations as seen in image 2. Nesterov momentum is an improvement over standard momentum -- a ball that blindly follows the slope is unsatisfactory. Ideally, the ball would know where it is going so it can slow down before the hill slopes up again.