Training Over-parameterized Deep ResNet Is almost as Easy as Training a Two-layer Network

Zhang, Huishuai, Yu, Da, Chen, Wei, Liu, Tie-Yan

Mar-17-2019–arXiv.org Machine Learning

Although deep neural networks have achieved revolutionary success over various tasks, i.e., computer vision [He et al., 2016] and natural language understanding [Hochreiter and Schmidhuber, 1997], they are still in lack of a rigorous theoretical study of the optimization and generalization properties. Specifically for the optimization, because the loss of deep neural network is highly nonconvex, local search algorithms like gradient descent is hard to analyze with performance guarantee. Many recent works [Choromanska et al., 2015, Kawaguchi, 2016, Nguyen and Hein, 2017, Soudry and Hoffer, 2017] have studied the loss surface of the neural networks and a common claim is that (deep) neural networks have H. Zhang, W. Chen and TY Liu are with Microsoft Research Asia, Beijing, 100080 China (email: {huzhang, wche, tyliu}@microsoft.com); D. Yu is with School of Data and Computer Science at Sun Yat-sen University, Guangzhou, 510275, China (email: yuda3@mail2.sysu.edu.cn).

artificial intelligence, machine learning, resnet, (18 more...)

arXiv.org Machine Learning

Mar-17-2019

arXiv.org PDF

Add feedback

Country:
- Asia > China
  - Guangdong Province > Guangzhou (0.24)
  - Beijing > Beijing (0.24)
- Africa > Middle East
  - Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found