Optimizers -- Momentum and Nesterov momentum algorithms (Part 2)
Welcome to the second part on optimisers where we will be discussing momentum and Nesterov accelerated gradient. If you want a quick review of vanilla gradient descent algorithms and its variants, please read about it in part1. In part3 of this series, I will be explaining RMSprop and Adam in detail. Gradient descent uses the gradients to update the weights and these can be sometimes noisy. In mini-batch gradient descent while we are updating the weights based on the data in a given batch, there will be some variance in the direction of the update.
Nov-22-2021, 08:35:08 GMT
- Technology: