Neural Collapse in Deep Linear Networks: From Balanced to Imbalanced Data
Dang, Hien, Tran, Tho, Osher, Stanley, Tran-The, Hung, Ho, Nhat, Nguyen, Tan
–arXiv.org Artificial Intelligence
Despite the impressive performance of deep neural networks Modern deep neural networks have achieved impressive (DNNs) across areas of machine learning and artificial intelligence performance on tasks from image classification (Krizhevsky et al., 2012; Simonyan & Zisserman, to natural language processing. Surprisingly, 2015; Goodfellow et al., 2016; He et al., 2016b; Huang these complex systems with massive et al., 2017; Brown et al., 2020), the highly non-convex amounts of parameters exhibit the same structural nature of these systems, as well as their massive number of properties in their last-layer features and classifiers parameters, ranging from hundreds of millions to hundreds across canonical datasets when training until of billions, impose a significant barrier to having a concrete convergence. In particular, it has been observed theoretical understanding of how they work. Additionally, a that the last-layer features collapse to their classmeans, variety of optimization algorithms have been developed for and those class-means are the vertices of training DNNs, which makes it more challenging to analyze a simplex Equiangular Tight Frame (ETF). This the resulting trained networks and learned features (Ruder, phenomenon is known as Neural Collapse (N C). 2016). In particular, the modern practice of training DNNs Recent papers have theoretically shown that N C includes training the models far beyond zero error to achieve emerges in the global minimizers of training problems zero loss in the terminal phase of training (TPT) (Ma et al., with the simplified "unconstrained feature 2018; Belkin et al., 2019a;b).
arXiv.org Artificial Intelligence
Jun-18-2023