Gradient descent is one of the most popular algorithms to perform optimization and by far the most common way to optimize neural networks. At the same time, every state-of-the-art Deep Learning library contains implementations of various algorithms to optimize gradient descent . This blog post aims at providing you with intuitions towards the behaviour of different algorithms for optimizing gradient descent that will help you put them to use. Gradient descent is a way to minimize an objective function J(θ) parameterized by a model's parameters by updating the parameters in the opposite direction of the gradient of the objective function .J(θ) w.r.t. to the parameters. The learning rate η determines the size of the steps we take to reach a (local) minimum.
Sep-1-2020, 08:05:52 GMT