Non-convergence to the optimal risk for Adam and stochastic gradient descent optimization in the training of deep neural networks

Do, Thang, Jentzen, Arnulf, Riekert, Adrian

arXiv.org Artificial Intelligence 

Stochastic gradient descent (SGD) optimization methods are the method of choice to train deep artificial neural networks (ANNs) in data-driven learning problems (see, for instance, [1, 7, 33, 36, 38, 41] and the references therein) as well as scientific computing problems (see, for example, [3, 13, 19, 22, 26, 40] and the references therein). However, often not the plain vanilla standard SGD optimization method is the employed optimizer but instead more sophisticated accelerated and adaptive variants of the standard SGD method such as the adaptive moment estimation SGD (Adam) optimizer (see [32]) are used in practically relevant deep ANN training problems. We also refer, for instance, to [2, 20, 23, 29, 35, 39] for monographs and surveys treating SGD optimization methods for the training of ANNs. The considered SGD optimization method is used with the aim to minimize the true risk function (the objective function) of the considered ANN learning problem so that, roughly speaking, the realization function of the deep ANN minimizing the true risk function approximates as best as possible the output data given the input data. Despite the omnipresent use of SGD optimization methods in the training of ANNs, it remains, in basically all practically relevant scenarios, a fundamental open problem to provide a rigorous theoretical description and explanation for the convergence (and non-convergence) properties of SGD optimization methods in deep learning. In particular, it remains an open question to prove or disprove convergence of the true risk of SGD optimization methods to the optimal/infimal true risk value in the training of deep ANNs (cf., for example, [11, 18, 31, 34] and the literature review in Subsection 1.4 below). In this work we contribute to this open problem of research in two aspects.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found