Convergence of gradient descent for deep neural networks

Dec-17-2022–arXiv.org Artificial Intelligence

The main difference with prior work is that the width of the network can be a fixed number instead of growing as some multiple or power of the number of data points. The convergence properties of gradient descent are well-understood when the objective function f is convex [14, 43], and it is known that finding local minima of nonconvex functions by gradient descent is an NPcomplete problem [42]. In spite of this, gradient descent is widely used in practice to find local and global minima in highly nonconvex problems, especially in high dimensions. For example, it has been observed that gradient descent can often find global minima of training loss in deep learning [27, 50], which is one of the reasons behind great success of the'deep learning revolution' [12, 36]. This article presents a novel criterion for convergence of gradient descent to a global minimum.

artificial intelligence, gradient descent, machine learning, (11 more...)

arXiv.org Artificial Intelligence

Dec-17-2022

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
  - California > Santa Clara County
    - Palo Alto (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - China (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (1.00)
  - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found