AITopics | Súkeník, Peter

Collaborating Authors

Súkeník, Peter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse

Jacot, Arthur, Súkeník, Peter, Wang, Zihan, Mondelli, Marco

arXiv.org Machine LearningOct-7-2024

Among the many possible interpolators that a deep neural network (DNN) can find, Papyan et al. (2020) showed a strong bias of gradient-based training towards representations with a highly symmetric structure in the penultimate layer, which was dubbed neural collapse (NC). In particular, the feature vectors of the training data in the penultimate layer collapse to a single vector per class (NC1); these vectors form orthogonal or simplex equiangular tight frames (NC2), and they are aligned with the last layer's row weight vectors (NC3). The question of why and how neural collapse emerges has been considered by a popular line of research, see e.g. Lu & Steinerberger (2022); E & Wojtowytsch (2022) and the discussion in Section 2. Many of these works focus on a simplified mathematical framework: the unconstrained features model (UFM) (Mixon et al., 2020; Han et al., 2022; Zhou et al., 2022a), corresponding to the joint optimization over the last layer's weights and the penultimate layer's feature representations, which are treated as free variables. To account for the existence of the training data and of all the layers before the penultimate (i.e., the backbone of the network), some form of regularization on the free features is usually added. A number of papers has proved the optimality of NC in this model (Lu & Steinerberger, 2022; E & Wojtowytsch, 2022), its emergence with gradient-based methods (Mixon et al., 2020; Han et al., 2022) and a benign loss landscape (Zhou et al., 2022a; Zhu et al., 2021). However, the major drawback of the UFM lies in its data-agnostic nature: it only acknowledges the presence of training data and backbone through a simple form of regularization (e.g., Frobenius norm or sphere constraint), which is far from being equivalent to end-to-end training.

artificial intelligence, machine learning, neural collapse, (15 more...)

arXiv.org Machine Learning

2410.04887

Country:

Europe (0.14)
North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Neural Collapse versus Low-rank Bias: Is Deep Neural Collapse Really Optimal?

Súkeník, Peter, Mondelli, Marco, Lampert, Christoph

arXiv.org Machine LearningMay-23-2024

Deep neural networks (DNNs) exhibit a surprising structure in their final layer known as neural collapse (NC), and a growing body of works has currently investigated the propagation of neural collapse to earlier layers of DNNs -- a phenomenon called deep neural collapse (DNC). However, existing theoretical results are restricted to special cases: linear models, only two layers or binary classification. In contrast, we focus on non-linear models of arbitrary depth in multi-class classification and reveal a surprising qualitative shift. As soon as we go beyond two layers or two classes, DNC stops being optimal for the deep unconstrained features model (DUFM) -- the standard theoretical framework for the analysis of collapse. The main culprit is a low-rank bias of multi-layer regularization schemes: this bias leads to optimal solutions of even lower rank than the neural collapse. We support our theoretical findings with experiments on both DUFM and real data, which show the emergence of the low-rank structure in the solution found by gradient descent.

artificial intelligence, machine learning, matrix, (16 more...)

arXiv.org Machine Learning

2405.14468

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Average gradient outer product as a mechanism for deep neural collapse

Beaglehole, Daniel, Súkeník, Peter, Mondelli, Marco, Belkin, Mikhail

arXiv.org Machine LearningFeb-21-2024

Deep Neural Collapse (DNC) refers to the surprisingly rigid structure of the data representations in the final layers of Deep Neural Networks (DNNs). Though the phenomenon has been measured in a wide variety of settings, its emergence is only partially understood. In this work, we provide substantial evidence that DNC formation occurs primarily through deep feature learning with the average gradient outer product (AGOP). This takes a step further compared to efforts that explain neural collapse via feature-agnostic approaches, such as the unconstrained features model. We proceed by providing evidence that the right singular vectors and values of the weights are responsible for the majority of within-class variability collapse in DNNs. As shown in recent work, this singular structure is highly correlated with that of the AGOP. We then establish experimentally and theoretically that AGOP induces neural collapse in a randomly initialized neural network. In particular, we demonstrate that Deep Recursive Feature Machines, a method originally introduced as an abstraction for AGOP feature learning in convolutional neural networks, exhibits DNC.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Machine Learning

2402.13728

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained Features Model

Súkeník, Peter, Mondelli, Marco, Lampert, Christoph

arXiv.org Artificial IntelligenceMay-22-2023

Neural collapse (NC) refers to the surprising structure of the last layer of deep neural networks in the terminal phase of gradient descent training. Recently, an increasing amount of experimental evidence has pointed to the propagation of NC to earlier layers of neural networks. However, while the NC in the last layer is well studied theoretically, much less is known about its multi-layered counterpart - deep neural collapse (DNC). In particular, existing work focuses either on linear layers or only on the last two layers at the price of an extra assumption. Our paper fills this gap by generalizing the established analytical framework for NC - the unconstrained features model - to multiple non-linear layers. Our key technical contribution is to show that, in a deep unconstrained features model, the unique global optimum for binary classification exhibits all the properties typical of DNC. This explains the existing experimental evidence of DNC. We also empirically show that (i) by optimizing deep unconstrained features models via gradient descent, the resulting solution agrees well with our theory, and (ii) trained networks recover the unconstrained features suitable for the occurrence of DNC, thus supporting the validity of this modeling principle.

artificial intelligence, machine learning, neural collapse, (15 more...)

arXiv.org Artificial Intelligence

2305.13165

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback