Paper Summary: On the importance of initialization and momentum in deep learning
The update equations are given above. The basic idea behind CM is that it accumulates a velocity vector in directions of persistent reduction in the objective across iterations. Directions of low-curvature which are suffering from a slow local change in their reduction, these will tend to persist across iterations and hence be amplified by the use of CM. Nesterov's Accelerated Gradient (NAG) is now described by the authors (update equations given above). While CM computes the gradient update from the current position θt, NAG first performs a partial update to θt, computing θt μvt, which is similar to θt 1, but missing the as yet unknown correction.
Jul-28-2022, 20:17:12 GMT
- Technology: