Collaborating Authors

Estimating an Optimal Learning Rate For a Deep Neural Network


The learning rate is one of the most important hyper-parameters to tune for training deep neural networks. In this post, I'm describing a simple and powerful way to find a reasonable learning rate that I learned from It's not available to the general public yet, but will be at the end of the year at There are many variations of stochastic gradient descent: Adam, RMSProp, Adagrad, etc. All of them let you set the learning rate.

Understanding Learning Rates and How It Improves Performance in Deep Learning


Much of this post are based on the stuff written by past This is a concise version of it, arranged in a way for one to quickly get to the meat of the material. Do go over the references for more details. First off, what is a learning rate? Learning rate is a hyper-parameter that controls how much we are adjusting the weights of our network with respect the loss gradient.

Ten Techniques Learned From


Right now, Jeremy Howard – the co-founder of Why? His own students are beating him. And their names can now be found across the tops of leaderboards all over Kaggle. So what are these secrets that are allowing novices to implement world-class algorithms in mere weeks, leaving behind experienced deep learning practitioners in their GPU-powered wake? Allow me to tell you in ten simple steps.

Hyper-parameter Tuning Techniques in Deep Learning


The process of setting the hyper-parameters requires expertise and extensive trial and error. There are no simple and easy ways to set hyper-parameters -- specifically, learning rate, batch size, momentum, and weight decay. Before discussing the ways to find the optimal hyper-parameters, let us first understand these hyper-parameters: learning rate, batch size, momentum, and weight decay. These hyper-parameters act as knobs which can be tweaked during the training of the model. For our model to provide best result, we need to find the optimal value of these hyper-parameters.