Implicit Regularization Towards Rank Minimization in ReLU Networks

Timor, Nadav, Vardi, Gal, Shamir, Ohad

arXiv.org Machine Learning 

A central puzzle in the theory of deep learning is how neural networks generalize even when trained without any explicit regularization, and when there are far more learnable parameters than training examples. In such an underdetermined optimization problem, there are many global minima with zero training loss, and gradient descent seems to prefer solutions that generalize well (see Zhang et al. (2017)). Hence, it is believed that gradient descent induces an implicit regularization (or implicit bias) (Neyshabur et al., 2015, 2017), and characterizing this regularization/bias has been a subject of extensive research. Several works in recent years studied the relationship between the implicit regularization in linear neural networks and rank minimization. A main focus is on the matrix factorization problem, which corresponds to training a depth-2 linear neural network with multiple outputs w.r.t. the square loss, and is considered a well-studied test-bed for studying implicit regularization in deep learning.