AITopics | learning rate scheme

Collaborating Authors

learning rate scheme

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Direction Matters: On the Implicit Regularization Effect of Stochastic Gradient Descent with Moderate Learning Rate

Wu, Jingfeng, Zou, Difan, Braverman, Vladimir, Gu, Quanquan

arXiv.org Machine LearningNov-4-2020

Understanding the algorithmic regularization effect of stochastic gradient descent (SGD) is one of the key challenges in modern machine learning and deep learning theory. Most of the existing works, however, focus on very small or even infinitesimal learning rate regime, and fail to cover practical scenarios where the learning rate is moderate and annealing. In this paper, we make an initial attempt to characterize the particular regularization effect of SGD in the moderate learning rate regime by studying its behavior for optimizing an overparameterized linear regression problem. In this case, SGD and GD are known to converge to the unique minimum-norm solution; however, with the moderate and annealing learning rate, we show that they exhibit different directional bias: SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions. Furthermore, we show that such directional bias does matter when early stopping is adopted, where the SGD output is nearly optimal but the GD output is suboptimal. Finally, our theory explains several folk arts in practice used for SGD hyperparameter tuning, such as (1) linearly scaling the initial learning rate with batch size; and (2) overrunning SGD with high learning rate even when the loss stops decreasing.

artificial intelligence, learning rate, machine learning, (13 more...)

arXiv.org Machine Learning

2011.02538

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Maryland > Baltimore (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

AdaOja: Adaptive Learning Rates for Streaming PCA

Henriksen, Amelia, Ward, Rachel

arXiv.org Machine LearningMay-28-2019

Oja's algorithm has been the cornerstone of streaming methods in Principal Component Analysis (PCA) since it was first proposed in 1982. However, Oja's algorithm does not have a standardized choice of learning rate (step size) that both performs well in practice and truly conforms to the online streaming setting. In this paper, we propose a new learning rate scheme for Oja's method called AdaOja. This new algorithm requires only a single pass over the data and does not depend on knowing properties of the data set a priori. AdaOja is a novel variation of the Adagrad algorithm to Oja's algorithm in the single eigenvector case and extended to the multiple eigenvector case. We demonstrate for dense synthetic data, sparse real-world data and dense real-world data that AdaOja outperforms common learning rate choices for Oja's method. We also show that AdaOja performs comparably to state-of-the-art algorithms (History PCA and Streaming Power Method) in the same streaming PCA setting.

artificial intelligence, machine learning, variance, (16 more...)

arXiv.org Machine Learning

1905.12115

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
Europe > Spain > Canary Islands (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.91)

Add feedback

DTN: A Learning Rate Scheme with Convergence Rate of $\mathcal{O}(1/t)$ for SGD

Nguyen, Lam M., Nguyen, Phuong Ha, Phan, Dzung T., Kalagnanam, Jayant R., van Dijk, Marten

arXiv.org Machine LearningJan-28-2019

We propose a novel diminishing learning rate scheme, coined Decreasing-Trend-Nature (DTN), which allows us to prove fast convergence of the Stochastic Gradient Descent (SGD) algorithm to a first-order stationary point for smooth general convex and some class of nonconvex including neural network applications for classification problems. We are the first to prove that SGD with diminishing learning rate achieves a convergence rate of $\mathcal{O}(1/t)$ for these problems. Our theory applies to neural network applications for classification problems in a straightforward way.

convergence rate, learning rate scheme, sgd, (13 more...)

arXiv.org Machine Learning

1901.07634

Country:

North America > United States > New York (0.04)
North America > United States > Connecticut (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.83)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback