Embedding Principle of Loss Landscape of Deep Neural Networks
Zhang, Yaoyu, Zhang, Zhongwang, Luo, Tao, Xu, Zhi-Qin John
Understanding the structure of loss landscape of deep neural networks (DNNs) is obviously important. In this work, we prove an embedding principle that the loss landscape of a DNN "contains" all the critical points of all the narrower DNNs. More precisely, we propose a critical embedding such that any critical point, e.g., local or global minima, of a narrower DNN can be embedded to a critical point/hyperplane of the target DNN with higher degeneracy and preserving the DNN output function. The embedding structure of critical points is independent of loss function and training data, showing a stark difference from other nonconvex problems such as protein-folding. Empirically, we find that a wide DNN is often attracted by highly-degenerate critical points that are embedded from narrow DNNs. The embedding principle provides an explanation for the general easy optimization of wide DNNs and unravels a potential implicit low-complexity regularization during the training. Overall, our work provides a skeleton for the study of loss landscape of DNNs and its implication, by which a more exact and comprehensive understanding can be anticipated in the near future.
May-30-2021
- Country:
- Asia > China
- Europe > France (0.04)
- North America > Canada
- Oceania > Australia
- New South Wales > Sydney (0.04)
- Genre:
- Research Report (0.82)
- Technology: