Generalization Bounds for Neural Networks via Approximate Description Length
–Neural Information Processing Systems
We investigate the sample complexity of networks with bounds on the magnitude of its weights. This bound is optimal up to log-factors, and substantially improves over the previous state of the art of $\tilde O\left(\frac{d 2R 2}{\epsilon 2}\right)$, that was established in a recent line of work. We furthermore show that this bound remains valid if instead of considering the magnitude of the $W_i$'s, we consider the magnitude of $W_i - W_i 0$, where $W_i 0$ are some reference matrices, with spectral norm of $O(1)$. By taking the $W_i 0$ to be the matrices in the onset of the training process, we get sample complexity bounds that are sub-linear in the number of parameters, in many {\em typical} regimes of parameters. To establish our results we develop a new technique to analyze the sample complexity of families $\ch$ of predictors.
Neural Information Processing Systems
Mar-19-2020, 02:01:31 GMT