Implicit regularization in AI meets generalized hardness of approximation in optimization -- Sharp results for diagonal linear networks
Wind, Johan S., Antun, Vegard, Hansen, Anders C.
–arXiv.org Artificial Intelligence
During the past decade, deep learning has transformed a number of historically challenging problems in computer vision, natural language processing, game intelligence, etc. In many of these applications, the trained neural networks used to solve these problems are over-parameterized. That is, the neural networks have far more parameters than the number of data points used for training. In this setting, a neural network can typically fit any training data - including random labels [95] - making it hard to explain why deep learning methods generalize so well [36]. Moreover, the practical performance of neural networks often improves as the number of parameters grow [55,84]. These observations have led to the study of the potential implicit regularization (sometimes called implicit bias) imposed by the gradient based methods and different network architectures [8, 68, 69]. It may seem surprising that there is a link to generalized hardness of approximation (GHA), as this phenomenon - at a first glance - may seem disconnected from implicit regularization. However, the GHA phenomenon (see 1.2), which first appeared in [13] (see also [2] Chapter 8) and analyzed [13, 34, 41] in connection with robust and convex optimization [20, 21, 63, 64], typically stem from regularization problems (e.g.
arXiv.org Artificial Intelligence
Jul-13-2023
- Country:
- Africa > Sudan (0.04)
- North America > United States
- New York > New York County > New York City (0.04)
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.28)
- Switzerland > Basel-City
- Basel (0.04)
- Norway > Eastern Norway
- Oslo (0.04)
- United Kingdom > England
- Genre:
- Research Report > New Finding (0.45)
- Industry:
- Leisure & Entertainment > Games (0.34)
- Technology: