Towards Understanding the Optimization Mechanisms in Deep Learning
Qi, Binchuan, Gong, Wei, Li, Li
–arXiv.org Artificial Intelligence
Key insights from the studies Arjevani and Field (2022); Chizat, Oyallon, and Bach (2018); Du, Zhai, P oczos, and Singh (2018); Yun, Sra, and Jadbabaie (2018) emphasize the pivotal role of over-parameterization in finding the global optimum and enhancing the generalization ability of deep neural networks (DNNs). Recent work has shown that the evolution of the trainable parameters in continuous-width DNNs during training can be captured by the neural tangent kernel (NTK) Arora, Du, Hu, Li, and Wang (2019); Du, Lee, Li, Wang, and Zhai (2018); Jacot, Gabriel, and Hongler (2018); Mohamadi, Bae, and Sutherland (2023); Wang, Li, and Sun (2023); Zou, Cao, Zhou, and Gu (2018). An alternative research direction attempts to examine the infinite-width neural network from a mean-field perspective (Chizat & Bach, 2018; Mei, Montanari, & Nguyen, 2018; Nguyen & Pham, 2023; Sirignano & Spiliopoulos, 2018). However, in practical applications, neural networks are of finite width, and under this condition, it remains unclear whether NTK theory and mean-field theory can adequately characterize the convergence properties of neural networks Seleznova and Kutyniok (2021). Therefore, the mechanisms of non-convex optimization in deep learning, and the impact of over-parameterization on model training, remain incompletely resolved.
arXiv.org Artificial Intelligence
Mar-29-2025
- Country:
- Asia > China
- Europe
- Switzerland > Vaud
- Lausanne (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Switzerland > Vaud
- North America > United States
- New York (0.04)
- Genre:
- Research Report > New Finding (0.46)
- Technology: