Paper Summary: On the importance of initialization and momentum in deep learning

#artificialintelligence 

The update equations are given above. The basic idea behind CM is that it accumulates a velocity vector in directions of persistent reduction in the objective across iterations. Directions of low-curvature which are suffering from a slow local change in their reduction, these will tend to persist across iterations and hence be amplified by the use of CM. Nesterov's Accelerated Gradient (NAG) is now described by the authors (update equations given above). While CM computes the gradient update from the current position θt, NAG first performs a partial update to θt, computing θt μvt, which is similar to θt 1, but missing the as yet unknown correction.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found