Collaborating Authors

Ten Techniques Learned From


Right now, Jeremy Howard – the co-founder of Why? His own students are beating him. And their names can now be found across the tops of leaderboards all over Kaggle. So what are these secrets that are allowing novices to implement world-class algorithms in mere weeks, leaving behind experienced deep learning practitioners in their GPU-powered wake? Allow me to tell you in ten simple steps.

Why I use Fastai and you should too.


The general consensus on finding the best LR was usually to train a model fully, until the desired metric was achieved, with different optimizers at different LRs. The optimal LR and optimizer are picked depending on what combination of them worked best in the picking phase. This is an ok technique, although computationally expensive. Note: As I was introduced early in my deep learning career to fastai, I do not know a lot about how things are done without/before fastai, so please let me know if this was a bit inaccurate, also take this section with a grain of salt. The fastai way to LRs is influenced by Leslie Smith's Paper [1].

Keras Learning Rate Finder - PyImageSearch


In this tutorial, you will learn how to automatically find learning rates using Keras. Last week we discussed Cyclical Learning Rates (CLRs) and how they can be used to obtain high accuracy models with fewer experiments and limited hyperparameter tuning. The CLR method allows our learning rate to cyclically oscillate between a lower and upper bound; however, the question still remains, how do we know what are good choices for our learning rates? Today I'll be answering that question. And by the time you have completed this tutorial, you will understand how to automatically find optimal learning rates for your neural network, saving you 10s, 100s or even 1000s of hours in compute time running experiments to tune your hyperparameters.

Choosing the Ideal Learning Rate


The learning rate is often considered to be the most important hyper-parameter when training a model. Choosing the optimal learning rate can greatly improve the training of a neural network and can prevent any odd behavior that may occur during stochastic gradient descent. Stochastic gradient descent (SGD) is an optimization algorithm that helps the loss function converge to the global minimum, or where the loss is at its lowest point. It behaves just like gradient descent, but also has batches to increase the computational efficiency. Gradient descent is performed to each of these smaller batches instead of the entire training set size.

Improving image classifiers for small datasets by learning rate adaptations Machine Learning

Our paper introduces an efficient combination of established techniques to improve classifier performance, in terms of accuracy and training time. We achieve two-fold to ten-fold speedup in nearing state of the art accuracy, over different model architectures, by dynamically tuning the learning rate. We find it especially beneficial in the case of a small dataset, where reliability of machine reasoning is lower. We validate our approach by comparing our method versus vanilla training on CIFAR-10. We also demonstrate its practical viability by implementing on an unbalanced corpus of diagnostic images.