Demystifying Different Variants of Gradient Descent Optimization Algorithm
In this post, we have looked at the batch gradient descent, the need to develop new optimization techniques, and then we briefly discussed how to interpret contour plots. After that, we have looked at behind six different optimization techniques and three different data strategies (batch, mini-batch & stochastic) with an intuitive understanding which helps to know where to use any of these algorithms. In Practice Adam optimizer with mini-batch of sizes 32, 64 and 128 is the default choice, at least for all the image classification tasks which deal with CNN and large sequence to sequence models.
Oct-17-2019, 12:58:50 GMT
- Technology: