Modified Step Size for Enhanced Stochastic Gradient Descent: Convergence and Experiments
Shamaee, M. Soheil, Hafshejani, S. Fathi
–arXiv.org Artificial Intelligence
Stochastic gradient descent (SGD) has a rich historical background, originating from the influential work by Robbins and Monro [11]. In the realm of modern machine learning, SGD has emerged as a fundamental optimization algorithm for training deep neural networks (DNNs), which have achieved remarkable performance across diverse domains such as image classification [6, 7], object detection [10], and machine translation [14]. The selection of an appropriate step size, often referred to as the learning rate, plays a pivotal role in the convergence behavior of SGD. If the step size value is too large, it can prevent SGD iterations from reaching the optimal point, leading to instability and divergence. On the other hand, excessively small step size values can result in slow convergence and hinder the algorithm's ability to escape suboptimal local minima [9].
arXiv.org Artificial Intelligence
Sep-3-2023
- Country:
- Asia > Middle East > Iran > Fars Province > Shiraz (0.04)
- Genre:
- Research Report > New Finding (0.46)
- Technology: