Training your Neural Network with Cyclical Learning Rates – MachineCurve

#artificialintelligence 

At a high level, training supervised machine learning models involves a few easy steps: feeding data to your model, computing loss based on the differences between predictions and ground truth, and using loss to improve the model with an optimizer. For example, it's possible to choose multiple optimizers – ranging from traditional Stochastic Gradient Descent to adaptive optimizers, which are also very common today. Say that you settle for the first – Stochastic Gradient Descent (SGD). Likely, in your deep learning framework, you'll see that the learning rate is a parameter that can be configured, with a default value that is preconfigured most of the times. Now, what is this learning rate? Why do we need them?