Modified Step Size for Enhanced Stochastic Gradient Descent: Convergence and Experiments

Shamaee, M. Soheil, Hafshejani, S. Fathi

Sep-3-2023–arXiv.org Artificial Intelligence

Stochastic gradient descent (SGD) has a rich historical background, originating from the influential work by Robbins and Monro [11]. In the realm of modern machine learning, SGD has emerged as a fundamental optimization algorithm for training deep neural networks (DNNs), which have achieved remarkable performance across diverse domains such as image classification [6, 7], object detection [10], and machine translation [14]. The selection of an appropriate step size, often referred to as the learning rate, plays a pivotal role in the convergence behavior of SGD. If the step size value is too large, it can prevent SGD iterations from reaching the optimal point, leading to instability and divergence. On the other hand, excessively small step size values can result in slow convergence and hinder the algorithm's ability to escape suboptimal local minima [9].

dataset, enhanced stochastic gradient descent, step size, (11 more...)

arXiv.org Artificial Intelligence

Sep-3-2023

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East > Iran > Fars Province > Shiraz (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (1.00)
  - Neural Networks (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found