Goto

Collaborating Authors

 Súkeník, Peter


Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse

arXiv.org Machine Learning

Among the many possible interpolators that a deep neural network (DNN) can find, Papyan et al. (2020) showed a strong bias of gradient-based training towards representations with a highly symmetric structure in the penultimate layer, which was dubbed neural collapse (NC). In particular, the feature vectors of the training data in the penultimate layer collapse to a single vector per class (NC1); these vectors form orthogonal or simplex equiangular tight frames (NC2), and they are aligned with the last layer's row weight vectors (NC3). The question of why and how neural collapse emerges has been considered by a popular line of research, see e.g. Lu & Steinerberger (2022); E & Wojtowytsch (2022) and the discussion in Section 2. Many of these works focus on a simplified mathematical framework: the unconstrained features model (UFM) (Mixon et al., 2020; Han et al., 2022; Zhou et al., 2022a), corresponding to the joint optimization over the last layer's weights and the penultimate layer's feature representations, which are treated as free variables. To account for the existence of the training data and of all the layers before the penultimate (i.e., the backbone of the network), some form of regularization on the free features is usually added. A number of papers has proved the optimality of NC in this model (Lu & Steinerberger, 2022; E & Wojtowytsch, 2022), its emergence with gradient-based methods (Mixon et al., 2020; Han et al., 2022) and a benign loss landscape (Zhou et al., 2022a; Zhu et al., 2021). However, the major drawback of the UFM lies in its data-agnostic nature: it only acknowledges the presence of training data and backbone through a simple form of regularization (e.g., Frobenius norm or sphere constraint), which is far from being equivalent to end-to-end training.


Neural Collapse versus Low-rank Bias: Is Deep Neural Collapse Really Optimal?

arXiv.org Machine Learning

Deep neural networks (DNNs) exhibit a surprising structure in their final layer known as neural collapse (NC), and a growing body of works has currently investigated the propagation of neural collapse to earlier layers of DNNs -- a phenomenon called deep neural collapse (DNC). However, existing theoretical results are restricted to special cases: linear models, only two layers or binary classification. In contrast, we focus on non-linear models of arbitrary depth in multi-class classification and reveal a surprising qualitative shift. As soon as we go beyond two layers or two classes, DNC stops being optimal for the deep unconstrained features model (DUFM) -- the standard theoretical framework for the analysis of collapse. The main culprit is a low-rank bias of multi-layer regularization schemes: this bias leads to optimal solutions of even lower rank than the neural collapse. We support our theoretical findings with experiments on both DUFM and real data, which show the emergence of the low-rank structure in the solution found by gradient descent.


Average gradient outer product as a mechanism for deep neural collapse

arXiv.org Machine Learning

Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the data representations in the final layers of Deep Neural Networks (DNNs). Though the phenomenon has been measured in a wide variety of settings, its emergence is only partially understood. In this work, we provide substantial evidence that DNC formation occurs primarily through deep feature learning with the average gradient outer product (AGOP). This takes a step further compared to efforts that explain neural collapse via feature-agnostic approaches, such as the unconstrained features model. We proceed by providing evidence that the right singular vectors and values of the weights are responsible for the majority of within-class variability collapse in DNNs. As shown in recent work, this singular structure is highly correlated with that of the AGOP. We then establish experimentally and theoretically that AGOP induces neural collapse in a randomly initialized neural network. In particular, we demonstrate that Deep Recursive Feature Machines, a method originally introduced as an abstraction for AGOP feature learning in convolutional neural networks, exhibits DNC.


Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained Features Model

arXiv.org Artificial Intelligence

Neural collapse (NC) refers to the surprising structure of the last layer of deep neural networks in the terminal phase of gradient descent training. Recently, an increasing amount of experimental evidence has pointed to the propagation of NC to earlier layers of neural networks. However, while the NC in the last layer is well studied theoretically, much less is known about its multi-layered counterpart - deep neural collapse (DNC). In particular, existing work focuses either on linear layers or only on the last two layers at the price of an extra assumption. Our paper fills this gap by generalizing the established analytical framework for NC - the unconstrained features model - to multiple non-linear layers. Our key technical contribution is to show that, in a deep unconstrained features model, the unique global optimum for binary classification exhibits all the properties typical of DNC. This explains the existing experimental evidence of DNC. We also empirically show that (i) by optimizing deep unconstrained features models via gradient descent, the resulting solution agrees well with our theory, and (ii) trained networks recover the unconstrained features suitable for the occurrence of DNC, thus supporting the validity of this modeling principle.