Sparsity-aware generalization theory for deep neural networks

Muthukumar, Ramchandran, Sulam, Jeremias

Jul-4-2023–arXiv.org Artificial Intelligence

Statistical learning theory seeks to characterize the generalization ability of machine learning models, obtained from finite training data, to unseen test data. The field is by now relatively mature, and several tools exist to provide upper bounds on the generalization error, R(h). Often the upper bounds depend on the empirical risk, ˆR(h), and different characterizations of complexity of the hypothesis class as well as potentially specific data-dependent properties. The renewed interest in deep artificial neural network models has demonstrated important limitations of existing tools. For example, VC dimension often simply relates to the number of model parameters and is hence insufficient to explain generalization of overparameterized models (Bartlett et al., 2019). Traditional measures based on Rademacher complexity are also often vacuous, as these networks can indeed be trained to fit random noise (Zhang et al., 2017). Margin bounds have been adapted to deep non-linear networks (Bartlett et al., 2017; Golowich et al., 2018; Neyshabur et al., 2015, 2018), albeit still unable to provide practically informative results. An increasing number of studies advocate for non-uniform data-dependent measures to explain generalization in deep learning (Nagarajan and Kolter, 2019a; Pérez and Louis, 2020; Wei and Ma, 2019).

artificial intelligence, generalization, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Jul-4-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found