Transfer Learning: How to pick the optimal learning rate?
Before training the model, let's dive into an important (hyper)parameter called learning rate that we will be optimizing in this workflow. Neural networks are trained using an optimization algorithm called stochastic gradient descent. In stochastic gradient descent, the error gradient for the current state of the model are estimated using back propagation, which mathematically means that we now have an intuitive understanding about the influence of changing weights on model performance. Using the error gradient, weights of the model are updated using a pre-determined step-size known as learning rate. In other words, learning rate is a variable that controls how much to change the model in response to the estimated error each time the model weights are updated.
Oct-3-2021, 09:30:23 GMT
- Technology: