Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear Functions
–arXiv.org Artificial Intelligence
We then inquire under which conditions the global minima of the loss recover the'true' rank of the data: we show that for too large depths the global minimum will be approximately rank 1 (underestimating the rank); we then argue that there is a range of depths which grows with the number of datapoints where the true rank is recovered. Finally, we discuss the effect of the rank of a classifier on the topology of the resulting class boundaries and show that autoencoders with optimal nonlinear rank are naturally denoising. There has been a lot of recent interest in the so-called implicit bias of DNNs, which describes what functions are favored by a network when fitting the training data. Different network architectures (choice of nonlinearity, depth, width of the network, and more) and training procedures (initialization, optimization algorithm, loss) can lead to widely different biases. In contrast to the so-called kernel regime where the implicit bias is described by the Neural Tangent Kernel (Jacot et al., 2018), there are several active regimes (also called rich or feature-learning regimes), whose implicit bias often feature a form sparsity that is absent from the kernel regime.
arXiv.org Artificial Intelligence
Mar-23-2023