AITopics | cyclical learning rate

Collaborating Authors

cyclical learning rate

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

On the optimization and pruning for Bayesian deep learning

Ke, Xiongwen, Fan, Yanan

arXiv.org Artificial IntelligenceOct-24-2022

The goal of Bayesian deep learning is to provide uncertainty quantification via the posterior distribution. However, exact inference over the weight space is computationally intractable due to the ultra-high dimensions of the neural network. Variational inference (VI) is a promising approach, but naive application on weight space does not scale well and often underperform on predictive accuracy. In this paper, we propose a new adaptive variational Bayesian algorithm to train neural networks on weight space that achieves high predictive accuracy. By showing that there is an equivalence to Stochastic Gradient Hamiltonian Monte Carlo(SGHMC) with preconditioning matrix, we then propose an MCMC within EM algorithm, which incorporates the spike-and-slab prior to capture the sparsity of the neural network. The EM-MCMC algorithm allows us to perform optimization and model pruning within one-shot. We evaluate our methods on CIFAR-10, CIFAR-100 and ImageNet datasets, and demonstrate that our dense model can reach the state-of-the-art performance and our sparse model perform very well compared to previously proposed pruning schemes.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2210.12957

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

Training your Neural Network with Cyclical Learning Rates – MachineCurve

#artificialintelligenceDec-3-2020, 04:50:30 GMT

At a high level, training supervised machine learning models involves a few easy steps: feeding data to your model, computing loss based on the differences between predictions and ground truth, and using loss to improve the model with an optimizer. For example, it's possible to choose multiple optimizers – ranging from traditional Stochastic Gradient Descent to adaptive optimizers, which are also very common today. Say that you settle for the first – Stochastic Gradient Descent (SGD). Likely, in your deep learning framework, you'll see that the learning rate is a parameter that can be configured, with a default value that is preconfigured most of the times. Now, what is this learning rate? Why do we need them?

cyclical learning rate, saddle point, smith, (12 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.95)

Add feedback

Deep Reinforcement Learning using Cyclical Learning Rates

Gulde, Ralf, Tuscher, Marc, Csiszar, Akos, Riedel, Oliver, Verl, Alexander

arXiv.org Machine LearningJul-31-2020

Deep Reinforcement Learning (DRL) methods often rely on the meticulous tuning of hyperparameters to successfully resolve problems. One of the most influential parameters in optimization procedures based on stochastic gradient descent (SGD) is the learning rate. We investigate cyclical learning and propose a method for defining a general cyclical learning rate for various DRL problems. In this paper we present a method for cyclical learning applied to complex DRL problems. Our experiments show that, utilizing cyclical learning achieves similar or even better results than highly tuned fixed learning rates. This paper presents the first application of cyclical learning rates in DRL settings and is a step towards overcoming manual hyperparameter tuning.

learning rate, machine learning, reinforcement learning, (12 more...)

arXiv.org Machine Learning

2008.01171

Country: Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.05)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Applying Cyclical Learning Rate to Neural Machine Translation

Lee, Choon Meng, Liu, Jianfeng, Peng, Wei

arXiv.org Machine LearningApr-6-2020

In training deep learning networks, the optimizer and related learning rate are often used without much thought or with minimal tuning, even though it is crucial in ensuring a fast convergence to a good quality minimum of the loss function that can also generalize well on the test dataset. Drawing inspiration from the successful application of cyclical learning rate policy for computer vision related convolutional networks and datasets, we explore how cyclical learning rate can be applied to train transformer-based neural networks for neural machine translation. From our carefully designed experiments, we show that the choice of optimizers and the associated cyclical learning rate policy can have a significant impact on the performance. In addition, we establish guidelines when applying cyclical learning rates to neural machine translation tasks. Thus with our work, we hope to raise awareness of the importance of selecting the right optimizers and the accompanying learning rate policy, at the same time, encourage further research into easy-to-use learning rate policies.

batch size, clr, learning rate, (14 more...)

arXiv.org Machine Learning

2004.02401

Country:

Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)
Europe > Germany > Berlin (0.04)
Europe > Czechia > Prague (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Keras Learning Rate Finder - PyImageSearch

#artificialintelligenceNov-6-2019, 07:11:26 GMT

In this tutorial, you will learn how to automatically find learning rates using Keras. Last week we discussed Cyclical Learning Rates (CLRs) and how they can be used to obtain high accuracy models with fewer experiments and limited hyperparameter tuning. The CLR method allows our learning rate to cyclically oscillate between a lower and upper bound; however, the question still remains, how do we know what are good choices for our learning rates? Today I'll be answering that question. And by the time you have completed this tutorial, you will understand how to automatically find optimal learning rates for your neural network, saving you 10s, 100s or even 1000s of hours in compute time running experiments to tune your hyperparameters.

keras, optimal, tutorial, (12 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.87)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Cyclical Learning Rates with Keras and Deep Learning - PyImageSearch

#artificialintelligenceOct-17-2019, 12:59:07 GMT

In this tutorial, you will learn how to use Cyclical Learning Rates (CLR) and Keras to train your own neural networks. Using Cyclical Learning Rates you can dramatically reduce the number of experiments required to tune and find an optimal learning rate for your model. Last week we discussed the concept of learning rate schedules and how we can decay and decrease our learning rate over time according to a set function (i.e., linear, polynomial, or step decrease). Cyclical Learning Rates take a different approach. In practice, using Cyclical Learning Rates leads to faster convergence and with fewer experiments/hyperparameter updates.

cyclical learning rate, learning rate, triangular, (11 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

Collaborative Deep Learning Across Multiple Data Centers

Xu, Kele, Mi, Haibo, Feng, Dawei, Wang, Huaimin, Chen, Chuan, Zheng, Zibin, Lan, Xu

arXiv.org Machine LearningOct-16-2018

Valuable training data is often owned by independent organizations and located in multiple data centers. Most deep learning approaches require to centralize the multi-datacenter data for performance purpose. In practice, however, it is often infeasible to transfer all data to a centralized data center due to not only bandwidth limitation but also the constraints of privacy regulations. Model averaging is a conventional choice for data parallelized training, but its ineffectiveness is claimed by previous studies as deep neural networks are often non-convex. In this paper, we argue that model averaging can be effective in the decentralized environment by using two strategies, namely, the cyclical learning rate and the increased number of epochs for local model training. With the two strategies, we show that model averaging can provide competitive performance in the decentralized mode compared to the data-centralized one. In a practical environment with multiple data centers, we conduct extensive experiments using state-of-the-art deep network architectures on different types of data. Results demonstrate the effectiveness and robustness of the proposed method.

artificial intelligence, machine learning, participant, (15 more...)

arXiv.org Machine Learning

1810.06877

Country:

Asia > China (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Understanding Learning Rates and How It Improves Performance in Deep Learning

@machinelearnbotFeb-1-2018, 18:35:50 GMT

One only needs to type in the following command to start finding the most optimal learning rate to use before training a neural network. At this juncture we've covered what learning rate is all about, it's importance, and how can we systematically come to an optimal value to use when we start training our model. Next we would go through how learning rates can still be used to improve our model's performance.

artificial intelligence, learning rate, machine learning, (15 more...)

@machinelearnbot

Genre: Instructional Material (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback

[R] Cyclical Learning Rates for Training Neural Networks • r/MachineLearning

@machinelearnbotSep-24-2017, 02:35:09 GMT

Submission statement: Finding the correct learning rate is a pain. But this paper shows how to find reasonable learning rate bounds. Then you can cyclically vary your learning rate to getting better accuracy and often a decreasing training time. P.S There is a PR for this in keras-contrib.

cyclical learning rate, machinelearning

@machinelearnbot

Industry: Media > News (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback