Training Over-parameterized Deep ResNet Is almost as Easy as Training a Two-layer Network
Zhang, Huishuai, Yu, Da, Chen, Wei, Liu, Tie-Yan
Although deep neural networks have achieved revolutionary success over various tasks, i.e., computer vision [He et al., 2016] and natural language understanding [Hochreiter and Schmidhuber, 1997], they are still in lack of a rigorous theoretical study of the optimization and generalization properties. Specifically for the optimization, because the loss of deep neural network is highly nonconvex, local search algorithms like gradient descent is hard to analyze with performance guarantee. Many recent works [Choromanska et al., 2015, Kawaguchi, 2016, Nguyen and Hein, 2017, Soudry and Hoffer, 2017] have studied the loss surface of the neural networks and a common claim is that (deep) neural networks have H. Zhang, W. Chen and TY Liu are with Microsoft Research Asia, Beijing, 100080 China (email: {huzhang, wche, tyliu}@microsoft.com); D. Yu is with School of Data and Computer Science at Sun Yat-sen University, Guangzhou, 510275, China (email: yuda3@mail2.sysu.edu.cn).
Mar-17-2019
- Country:
- Asia > China
- Guangdong Province > Guangzhou (0.24)
- Beijing > Beijing (0.24)
- Africa > Middle East
- Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- Asia > China
- Genre:
- Research Report > New Finding (0.46)
- Technology: