On the Implicit Bias Towards Minimal Depth of Deep Neural Networks
Galanti, Tomer, Galanti, Liane, Ben-Shaul, Ido
–arXiv.org Artificial Intelligence
Recent results in the literature suggest that the penultimate (second-to-last) layer representations of neural networks that are trained for classification exhibit a clustering property called neural collapse (NC). We study the implicit bias of stochastic gradient descent (SGD) in favor of low-depth solutions when training deep neural networks. We characterize a notion of effective depth that measures the first layer for which sample embeddings are separable using the nearest-class center classifier. Furthermore, we hypothesize and empirically show that SGD implicitly selects neural networks of small effective depths. Secondly, while neural collapse emerges even when generalization should be impossible - we argue that the \emph{degree of separability} in the intermediate layers is related to generalization. We derive a generalization bound based on comparing the effective depth of the network with the minimal depth required to fit the same dataset with partially corrupted labels. Remarkably, this bound provides non-trivial estimations of the test performance. Finally, we empirically show that the effective depth of a trained neural network monotonically increases when increasing the number of random labels in data.
arXiv.org Artificial Intelligence
Sep-27-2022
- Country:
- Asia > Middle East
- Israel > Tel Aviv District > Tel Aviv (0.04)
- Europe
- France > Île-de-France
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America > United States
- Massachusetts > Middlesex County > Cambridge (0.14)
- Asia > Middle East
- Genre:
- Research Report (1.00)
- Technology: