Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear Functions

Jacot, Arthur

arXiv.org Artificial Intelligence 

We then inquire under which conditions the global minima of the loss recover the'true' rank of the data: we show that for too large depths the global minimum will be approximately rank 1 (underestimating the rank); we then argue that there is a range of depths which grows with the number of datapoints where the true rank is recovered. Finally, we discuss the effect of the rank of a classifier on the topology of the resulting class boundaries and show that autoencoders with optimal nonlinear rank are naturally denoising. There has been a lot of recent interest in the so-called implicit bias of DNNs, which describes what functions are favored by a network when fitting the training data. Different network architectures (choice of nonlinearity, depth, width of the network, and more) and training procedures (initialization, optimization algorithm, loss) can lead to widely different biases. In contrast to the so-called kernel regime where the implicit bias is described by the Neural Tangent Kernel (Jacot et al., 2018), there are several active regimes (also called rich or feature-learning regimes), whose implicit bias often feature a form sparsity that is absent from the kernel regime.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found