New logarithmic step size for stochastic gradient descent

Shamaee, M. Soheil, Hafshejani, S. Fathi, Saeidian, Z.

Apr-1-2024–arXiv.org Artificial Intelligence

Stochastic gradient descent (SGD), which dates back to the work by Robbins and Monro Robbins and Monro [1951a] is widely observed in training modern Deep Neural Networks (DNNs), which are widely used to achieve state-of-the-art results in multiple problem domains like image classification problems Krizhevsky et al. [2017, 2009], object detection Redmon and Farhadi [2017], and classification automatic machine translation Zhang et al. [2015]. The value of the step size (or learning rate) is crucial for the convergence rate of SGD. Selecting an appropriate step size value in each iteration ensures that SGD iterations converge to an optimal solution. If the step size value is too large, it may prevent SGD iterations from reaching the optimal point. Conversely, excessively small step size values can lead to slow convergence or mistakenly identify a local minimum as the optimal solution Mishra and Sarawadekar [2019]. To address these challenges, various schemes have been proposed. One popular approach is the Armijo line search method, initially introduced for SGD by Vaswani et al. Vaswani et al. [2019], which provides theoretical results for strong-convex, convex, and non-convex objective functions.

dataset, logarithmic step size, step size, (14 more...)

arXiv.org Artificial Intelligence

Apr-1-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East > Iran > Fars Province > Shiraz (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (0.89)
    - Statistical Learning > Gradient Descent (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found