Learning Non-Vacuous Generalization Bounds from Optimization

Tan, Chengli, Zhang, Jiangshe, Liu, Junmin

Jul-22-2024–arXiv.org Artificial Intelligence

Deep neural networks (DNNs) have shown remarkable performance in a wide range of tasks over the past decade (Bengio et al. 2021). A mystery is that they generalize surprisingly well on unseen data, though having far more trainable parameters than the number of training examples (Belkin et al. 2019, Li et al. 2023). This phenomenon of benign overfitting inevitably casts shadows on the classical theory of statistical learning, which posits that models with high complexity tend to overfit the training data, whereas models with low complexity tend to underfit the training data. To reconcile the conflicts, some researchers argue that this is due to the regularization incurred during training, either implicitly imposed via use of stochastic gradient descent (SGD) (Advani et al. 2020, Barrett & Dherin 2021, Smith et al. 2021, Sclocchi & Wyart 2024) or explicitly via batch normalization (Ioffe & Szegedy 2015), weight decay (Krogh & Hertz 1992), dropout (Srivastava et al. 2014), etc. However, Zhang et al. (2017) questioned this widely received wisdom because they found that DNNs are still able to achieve zero training error with randomly labeled examples, which apparently cannot generalize. Prior to our work, there has been extensive study trying to explain the generalization behavior of DNNs and they roughly can be categorized into the following classes. The first class is the so-called norm-based bounds (Neyshabur et al. 2015, Bartlett et al. 2017, Neyshabur et al. 2018, Golowich et al. 2018) that are composed of the operator norm of layerwise weight matrices. However, recent studies suggest that these norm-based bounds might be problematic as they abnormally increase with the number of training examples (Nagarajan & Kolter 2019). Moreover, norm-based bounds are numerically vacuous as they are even several orders of magnitude larger than the number of network parameters.

dimension, generalization, neural network, (14 more...)

arXiv.org Artificial Intelligence

Jul-22-2024

arXiv.org PDF

Add feedback

Country:
- North America > Canada
  - Ontario > Toronto (0.14)
- Europe > United Kingdom
  - England
    - Cambridgeshire > Cambridge (0.04)
    - Oxfordshire > Oxford (0.04)
- Asia
  - Middle East > Lebanon
    - Beqaa Governorate > Zahlé (0.04)
  - China > Shaanxi Province
    - Xi'an (0.04)

Genre:
- Research Report > New Finding (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Inductive Learning (1.00)
  - Statistical Learning > Gradient Descent (0.58)
  - Neural Networks > Deep Learning (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found