Reviews: Implicit Regularization in Deep Matrix Factorization

Neural Information Processing Systems 

This paper studies the implicit regularization of gradient descent over deep neural networks for deep matrix factorization models. The paper begins with a review of prior work regarding how running gradient descent on a shallow matrix factorization model, with small learning rate and initialization close to zero, tends to converge to solutions that minimize the nuclear norm [20] (Conjecture 1). This discussion is then extended to deep matrix factorization, where predictive performance improves with depth when the number of observed entries is small. Experimental results (Figure 2) which challenge Conjecture 1 are then presented, which indicate that implicit regularization in both shallow and deep matrix factorization converges to low-rank solutions, rather than minimizing nuclear norm, when few entries are observed. Finally, a theoretical and experimental analysis of the dynamics of gradient flow for deep matrix factorization is presented, which shows how singular values and singular vectors of the product matrix evolve during training, and how this leads to implicit regularization that induces low-rank solutions.