Modified Step Size for Enhanced Stochastic Gradient Descent: Convergence and Experiments

Shamaee, M. Soheil, Hafshejani, S. Fathi

arXiv.org Artificial Intelligence 

Stochastic gradient descent (SGD) has a rich historical background, originating from the influential work by Robbins and Monro [11]. In the realm of modern machine learning, SGD has emerged as a fundamental optimization algorithm for training deep neural networks (DNNs), which have achieved remarkable performance across diverse domains such as image classification [6, 7], object detection [10], and machine translation [14]. The selection of an appropriate step size, often referred to as the learning rate, plays a pivotal role in the convergence behavior of SGD. If the step size value is too large, it can prevent SGD iterations from reaching the optimal point, leading to instability and divergence. On the other hand, excessively small step size values can result in slow convergence and hinder the algorithm's ability to escape suboptimal local minima [9].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found