What, Why, and How of SGD Momentum Optimizer in Deep Learning
In deep learning, we have used stochastic gradient descent as one of the optimizers because at the end we will find the minimum weight and bias at which the model loss is lowest. In the SGD we have some issues in which the SGD does not work perfectly because in deep learning we got a non-convex cost function graph and if use the simple SGD then it leads to low performance. At the start, we randomly start at some point and we are going to end up at the local minimum and not able to reach the global minimum. A saddle point is a point where in one direction the surface goes in the upward direction and in another direction it goes downwards. So that the slope is changing very gradually so the speed of changing is going to slow and as result, the training also going to slow.
Sep-26-2022, 09:34:13 GMT
- Technology: