Convergence of gradient descent for deep neural networks

Chatterjee, Sourav

arXiv.org Artificial Intelligence 

The main difference with prior work is that the width of the network can be a fixed number instead of growing as some multiple or power of the number of data points. The convergence properties of gradient descent are well-understood when the objective function f is convex [14, 43], and it is known that finding local minima of nonconvex functions by gradient descent is an NPcomplete problem [42]. In spite of this, gradient descent is widely used in practice to find local and global minima in highly nonconvex problems, especially in high dimensions. For example, it has been observed that gradient descent can often find global minima of training loss in deep learning [27, 50], which is one of the reasons behind great success of the'deep learning revolution' [12, 36]. This article presents a novel criterion for convergence of gradient descent to a global minimum.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found