Optimization algorithms
Gradient descent is a first-order optimization algorithm. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point. If the step-size is too small, gradient descent can be slow (Vanishing gradient). The speed of convergence of gradient descent is dependent on the condition number κ σ(A)max/σ(A)min condition number, which is the ratio of the maximum to the minimum singular value of A. Gradient descent with momentum (Rumelhart et al., 1986) is a method that introduces an additional term to remember what happened in the previous iteration. Continuing the ball analogy, the momentum term emulates the phenomenon of a heavy ball that is reluctant to change directions.
Aug-8-2021, 16:55:38 GMT
- Technology: