On the equivalence of different adaptive batch size selection strategies for stochastic gradient descent methods

Espath, Luis, Krumscheid, Sebastian, Tempone, Raúl, Vilanova, Pedro

Sep-22-2021–arXiv.org Machine Learning

In this study, we demonstrate that the norm test and inner product/orthogonality test presented in \cite{Bol18} are equivalent in terms of the convergence rates associated with Stochastic Gradient Descent (SGD) methods if $\epsilon^2=\theta^2+\nu^2$ with specific choices of $\theta$ and $\nu$. Here, $\epsilon$ controls the relative statistical error of the norm of the gradient while $\theta$ and $\nu$ control the relative statistical error of the gradient in the direction of the gradient and in the direction orthogonal to the gradient, respectively. Furthermore, we demonstrate that the inner product/orthogonality test can be as inexpensive as the norm test in the best case scenario if $\theta$ and $\nu$ are optimally selected, but the inner product/orthogonality test will never be more computationally affordable than the norm test if $\epsilon^2=\theta^2+\nu^2$. Finally, we present two stochastic optimization problems to illustrate our results.

inner product orthogonality test, norm test, orthogonality test, (11 more...)

arXiv.org Machine Learning

Sep-22-2021

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Saudi Arabia (0.04)
- Europe > Germany
  - North Rhine-Westphalia > Cologne Region > Aachen (0.04)
- North America > United States
  - New Jersey > Hudson County > Hoboken (0.04)

Genre:
- Research Report > New Finding (0.69)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Statistical Learning
    - Gradient Descent (1.00)
  - Representation & Reasoning (1.00)