Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data

Dang, Hien, Tran, Tho, Osher, Stanley, Tran-The, Hung, Ho, Nhat, Nguyen, Tan

Jun-18-2023–arXiv.org Artificial Intelligence

Despite the impressive performance of deep neural networks Modern deep neural networks have achieved impressive (DNNs) across areas of machine learning and artificial intelligence performance on tasks from image classification (Krizhevsky et al., 2012; Simonyan & Zisserman, to natural language processing. Surprisingly, 2015; Goodfellow et al., 2016; He et al., 2016b; Huang these complex systems with massive et al., 2017; Brown et al., 2020), the highly non-convex amounts of parameters exhibit the same structural nature of these systems, as well as their massive number of properties in their last-layer features and classifiers parameters, ranging from hundreds of millions to hundreds across canonical datasets when training until of billions, impose a significant barrier to having a concrete convergence. In particular, it has been observed theoretical understanding of how they work. Additionally, a that the last-layer features collapse to their classmeans, variety of optimization algorithms have been developed for and those class-means are the vertices of training DNNs, which makes it more challenging to analyze a simplex Equiangular Tight Frame (ETF). This the resulting trained networks and learned features (Ruder, phenomenon is known as Neural Collapse (N C). 2016). In particular, the modern practice of training DNNs Recent papers have theoretically shown that N C includes training the models far beyond zero error to achieve emerges in the global minimizers of training problems zero loss in the terminal phase of training (TPT) (Ma et al., with the simplified "unconstrained feature 2018; Belkin et al., 2019a;b).

artificial intelligence, deep linear network, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Jun-18-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.27)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found