Weight Expansion: A New Perspective on Dropout and Generalization

Jin, Gaojie, Yi, Xinping, Yang, Pengfei, Zhang, Lijun, Schewe, Sven, Huang, Xiaowei

Jan-23-2022–arXiv.org Machine Learning

While dropout is known to be a successful regularization technique, insights into the mechanisms that lead to this success are still lacking. We introduce the concept of weight expansion, an increase in the signed volume of a parallelotope spanned by the column or row vectors of the weight covariance matrix, and show that weight expansion is an effective means of increasing the generalization in a PAC-Bayesian setting. We provide a theoretical argument that dropout leads to weight expansion and extensive empirical support for the correlation between dropout and weight expansion. To support our hypothesis that weight expansion can be regarded as an indicator of the enhanced generalization capability endowed by dropout, and not just as a mere by-product, we have studied other methods that achieve weight expansion (resp. This suggests that dropout is an attractive regularizer, because it is a computationally cheap method for obtaining weight expansion. This insight justifies the role of dropout as a regularizer, while paving the way for identifying regularizers that promise improved generalization through weight expansion. Research on why dropout is so effective in improving the generalization ability of neural networks has been intensive. Many intriguing phenomena induced by dropout have also been studied in this research (Gao et al., 2019; Lengerich et al., 2020; Wei et al., 2020).

artificial intelligence, dropout, machine learning, (18 more...)

arXiv.org Machine Learning

Jan-23-2022

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.14)

Genre:
- Research Report (0.63)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.46)
    - Neural Networks > Deep Learning (0.68)
    - Statistical Learning (1.00)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.93)