Stronger Convergence Results for Deep Residual Networks: Network Width Scales Linearly with Training Data Size

Nov-11-2019–arXiv.org Machine Learning

Deep neural networks have gained remarkable success over a l arge variety of applications, including computer vision [ 1 ], natural language processing [ 2 ], speech recognition [ 3 ] and Go games [ 4 ]. But the reason why deep networks perform well over various tasks is still not exactly understood. The optimization performance of deep networks is one of the subj ects which requires an involved theoretical study, given that gradient descent can achieve zero training loss even for random labels [ 5 ], and the loss of deep networks is highly non-convex. There are different lines of works investigating the optimization of deep networks from different perspec tives. For example, a large number of works consider the optimization landscape correspondin g to different activation functions [ 6 - 11 ], whereas some others [ 12 - 15 ] ensure global convergence by imposing some restrictions o n the input distribution. In the recent years, there has been considerably many papers providing convergence guarantees for over-parameterized two-layer and deep networks. It is s hown in [ 16 ] that gradient descent can find the near-global minima of a single hidden layer network i n polynomial time with respect to the accuracy and sample size.

arxiv preprint arxiv, neural network, res, (12 more...)

arXiv.org Machine Learning

Nov-11-2019

arXiv.org PDF

Add feedback

Country:
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre:
- Research Report (0.63)

Industry:
- Leisure & Entertainment > Games > Go (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found