New logarithmic step size for stochastic gradient descent
Shamaee, M. Soheil, Hafshejani, S. Fathi, Saeidian, Z.
–arXiv.org Artificial Intelligence
Stochastic gradient descent (SGD), which dates back to the work by Robbins and Monro Robbins and Monro [1951a] is widely observed in training modern Deep Neural Networks (DNNs), which are widely used to achieve state-of-the-art results in multiple problem domains like image classification problems Krizhevsky et al. [2017, 2009], object detection Redmon and Farhadi [2017], and classification automatic machine translation Zhang et al. [2015]. The value of the step size (or learning rate) is crucial for the convergence rate of SGD. Selecting an appropriate step size value in each iteration ensures that SGD iterations converge to an optimal solution. If the step size value is too large, it may prevent SGD iterations from reaching the optimal point. Conversely, excessively small step size values can lead to slow convergence or mistakenly identify a local minimum as the optimal solution Mishra and Sarawadekar [2019]. To address these challenges, various schemes have been proposed. One popular approach is the Armijo line search method, initially introduced for SGD by Vaswani et al. Vaswani et al. [2019], which provides theoretical results for strong-convex, convex, and non-convex objective functions.
arXiv.org Artificial Intelligence
Apr-1-2024
- Country:
- Asia > Middle East > Iran > Fars Province > Shiraz (0.04)
- Genre:
- Research Report > New Finding (0.46)
- Technology: