A Universal Law of Robustness via Isoperimetry
Bubeck, Sébastien, Sellke, Mark
We propose an explanation for this enigmatic phenomenon, showing in great generality that finding a smooth function to fit d-dimensional data requires at least nd parameters. In other words, overparametrization by a factor of d is necessary for smooth interpolation, suggesting that perhaps the large size of the models used in deep learning is a necessity rather than a weakness of the framework. Another way to phrase the result is as a tradeoff between the size of a model (as measured by the number of parameters) and its "robustness" (as measured by its Lipschitz constant): either one has a small model (with n parameters) which must then be non-robust, or one has a robust model (constant Lipschitz) but then it must be very large (with nd parameters). Such a tradeoff was conjectured for the specific case of two-layer neural networks and Gaussian data in [BLN21]. Our result shows that in fact it is a universal phenomenon, which applies to essentially any parametrized function class (including in particular deep neural networks) as well as a much broader class of data distributions.
Jun-7-2021