Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks
Lei, Yunwen, Jin, Rong, Ying, Yiming
–arXiv.org Artificial Intelligence
Neural networks have achieved remarkable success in solving large-scale machine learning problems in various application domains such as computer vision and natural language processing [33]. Firstorder methods such as gradient descent (GD) and stochastic gradient descent (SGD) are mainstream optimization algorithms for training neural networks due to their simplicity and efficiency [11, 33, 50]. Although the associated optimization problems are nonconvex and nonsmooth, GD/SGD can still find a model with a very small or even zero training error [16, 20, 34, 39, 64, 69]. At the same time, the models found by such first-order methods has demonstrated good generalization performance on test data despite neural networks are often highly overparameterized in the sense that the number of parameters is much larger than the size of training examples [1, 2, 5]. These surprising phenomena have triggered a surge of research activities in understanding the generalization ability of neural networks. Generalization analysis typically uses complexity measures such as VC dimension, covering numbers or Rademacher complexities to develop capacity-dependent bounds [8, 9, 25, 42, 48], which, however, may not explain well the generalization of overparameterized neural networks. Impressive alternatives have been proposed which include the compression approach [4], the norm-based analysis [8, 25], the PAC-Bayes analysis [21] and the neural tangent kernel (NTK) approach [5, 28]. In particular, the NTK approach shows that the overparameterization pulls the dynamic of GD on neural networks close to its counterpart on a kernelized machine with the least-square loss [5, 20], which shows how overparameterization can help both optimization and generalization. However, this approach often requires a very high overparameterization to gain useful results [6, 55, 60].
arXiv.org Artificial Intelligence
Sep-19-2022
- Country:
- North America > United States
- New York (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Genre:
- Research Report (1.00)
- Industry:
- Education (0.34)
- Technology: