A Universal Law of Robustness via Isoperimetry

Jun-7-2021–arXiv.org Machine Learning

We propose an explanation for this enigmatic phenomenon, showing in great generality that finding a smooth function to fit d-dimensional data requires at least nd parameters. In other words, overparametrization by a factor of d is necessary for smooth interpolation, suggesting that perhaps the large size of the models used in deep learning is a necessity rather than a weakness of the framework. Another way to phrase the result is as a tradeoff between the size of a model (as measured by the number of parameters) and its "robustness" (as measured by its Lipschitz constant): either one has a small model (with n parameters) which must then be non-robust, or one has a robust model (constant Lipschitz) but then it must be very large (with nd parameters). Such a tradeoff was conjectured for the specific case of two-layer neural networks and Gaussian data in [BLN21]. Our result shows that in fact it is a universal phenomenon, which applies to essentially any parametrized function class (including in particular deep neural networks) as well as a much broader class of data distributions.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Machine Learning

Jun-7-2021

arXiv.org PDF

Add feedback

Country:
- North America (0.28)

Genre:
- Research Report > New Finding (0.54)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found