Generalization Bounds for Neural Networks via Approximate Description Length

Mar-19-2020, 02:01:31 GMT–Neural Information Processing Systems

We investigate the sample complexity of networks with bounds on the magnitude of its weights. This bound is optimal up to log-factors, and substantially improves over the previous state of the art of $\tilde O\left(\frac{d 2R 2}{\epsilon 2}\right)$, that was established in a recent line of work. We furthermore show that this bound remains valid if instead of considering the magnitude of the $W_i$'s, we consider the magnitude of $W_i - W_i 0$, where $W_i 0$ are some reference matrices, with spectral norm of $O(1)$. By taking the $W_i 0$ to be the matrices in the onset of the training process, we get sample complexity bounds that are sub-linear in the number of parameters, in many {\em typical} regimes of parameters. To establish our results we develop a new technique to analyze the sample complexity of families $\ch$ of predictors.

approximate description length, generalization bound, sample complexity, (5 more...)

Neural Information Processing Systems

Mar-19-2020, 02:01:31 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.38)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)