Mean-Field Analysis of Two-Layer Neural Networks: Global Optimality with Linear Convergence Rates
Zhang, Jingwei, Huang, Xunpeng, Yu, Jincheng
–arXiv.org Artificial Intelligence
Gradient-based optimization is a fundamental tool in machine learning and has witnessed great empirical success for training neural networks, despite the highly non-convexity landscape of the objective. However, theoretical understanding of nonconvex optimization in neural networks is quite limited. Until recently, there has been much work that explains the success of gradientbased optimization in overparametrized neural networks, that is neural networks with massive hidden units. Under the overparametrization condition, the learning problem can be translated into minimizing a convex functional and hence circumventing the difficulties of analyzing non-convex objectives. It's worth mentioning that there has been much broader interest in analyzing the convergence of machine learning algorithms by formulating it as the problem of minimizing some (usually convex) functional of a measure, such as variational inference (Liu and Wang (2016); Liu (2017); Chewi et al. (2020)), generative adversatial networks (Johnson and Zhang (2019); Nitanda and Suzuki (2020)) and learning infinite-width neural networks (Chizat and Bach (2018); Mei et al. (2018); Nguyen and Pham (2020); Fang et al. (2021)). The key idea is by approximating the learning dynamics of model parameters by the optimization on the space of probability of measures over the model parameters under the overparametrization condition.
arXiv.org Artificial Intelligence
Oct-17-2022
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe > France
- Occitanie > Haute-Garonne > Toulouse (0.04)
- North America > United States
- Texas > Clay County (0.04)
- Asia > Middle East
- Genre:
- Research Report (1.00)
- Industry:
- Education (0.34)
- Technology: