Training Over-parameterized Deep ResNet Is almost as Easy as Training a Two-layer Network

Zhang, Huishuai, Yu, Da, Chen, Wei, Liu, Tie-Yan

arXiv.org Machine Learning 

Although deep neural networks have achieved revolutionary success over various tasks, i.e., computer vision [He et al., 2016] and natural language understanding [Hochreiter and Schmidhuber, 1997], they are still in lack of a rigorous theoretical study of the optimization and generalization properties. Specifically for the optimization, because the loss of deep neural network is highly nonconvex, local search algorithms like gradient descent is hard to analyze with performance guarantee. Many recent works [Choromanska et al., 2015, Kawaguchi, 2016, Nguyen and Hein, 2017, Soudry and Hoffer, 2017] have studied the loss surface of the neural networks and a common claim is that (deep) neural networks have H. Zhang, W. Chen and TY Liu are with Microsoft Research Asia, Beijing, 100080 China (email: {huzhang, wche, tyliu}@microsoft.com); D. Yu is with School of Data and Computer Science at Sun Yat-sen University, Guangzhou, 510275, China (email: yuda3@mail2.sysu.edu.cn).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found