Implicit Regularization Towards Rank Minimization in ReLU Networks

Jan-30-2022–arXiv.org Machine Learning

A central puzzle in the theory of deep learning is how neural networks generalize even when trained without any explicit regularization, and when there are far more learnable parameters than training examples. In such an underdetermined optimization problem, there are many global minima with zero training loss, and gradient descent seems to prefer solutions that generalize well (see Zhang et al. (2017)). Hence, it is believed that gradient descent induces an implicit regularization (or implicit bias) (Neyshabur et al., 2015, 2017), and characterizing this regularization/bias has been a subject of extensive research. Several works in recent years studied the relationship between the implicit regularization in linear neural networks and rank minimization. A main focus is on the matrix factorization problem, which corresponds to training a depth-2 linear neural network with multiple outputs w.r.t. the square loss, and is considered a well-studied test-bed for studying implicit regularization in deep learning.

artificial intelligence, machine learning, regularization, (18 more...)

arXiv.org Machine Learning

Jan-30-2022

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.45)
  - Statistical Learning > Gradient Descent (0.55)