Implicit regularization in AI meets generalized hardness of approximation in optimization -- Sharp results for diagonal linear networks

Wind, Johan S., Antun, Vegard, Hansen, Anders C.

arXiv.org Artificial Intelligence 

During the past decade, deep learning has transformed a number of historically challenging problems in computer vision, natural language processing, game intelligence, etc. In many of these applications, the trained neural networks used to solve these problems are over-parameterized. That is, the neural networks have far more parameters than the number of data points used for training. In this setting, a neural network can typically fit any training data - including random labels [95] - making it hard to explain why deep learning methods generalize so well [36]. Moreover, the practical performance of neural networks often improves as the number of parameters grow [55,84]. These observations have led to the study of the potential implicit regularization (sometimes called implicit bias) imposed by the gradient based methods and different network architectures [8, 68, 69]. It may seem surprising that there is a link to generalized hardness of approximation (GHA), as this phenomenon - at a first glance - may seem disconnected from implicit regularization. However, the GHA phenomenon (see 1.2), which first appeared in [13] (see also [2] Chapter 8) and analyzed [13, 34, 41] in connection with robust and convex optimization [20, 21, 63, 64], typically stem from regularization problems (e.g.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found