Goto

Collaborating Authors

Layer-Peeled Model: Toward Understanding Well-Trained Deep Neural Networks

arXiv.org Machine Learning

Interestingly, these impressive accomplishments were mostly achieved by heuristics and tricks, though often plausible, without much principled guidance from a theoretical perspective. On the flip side, however, this reality suggests the great potential a theory could have for advancing the development of deep learning methodologies in the coming decade. Unfortunately, it is not easy to develop a theoretical foundation for deep learning. Perhaps the most difficult hurdle lies in the nonconvexity of the optimization problem for training neural networks, which, loosely speaking, stems from the interaction between different layers of neural networks. To be more precise, consider a neural network for K -class classification (in logits), which in its simplest form reads 1 f ( x; W full) b L W Lσ ( b L 1 W L 1σ (··· σ (b 1 W 1x))) . Here, W full: {W 1, W 2,..., W L} denotes the weights of the L layers, { b 1, b 2,..., b L} denotes the biases, and σ (·) is a nonlinear activation function such as the ReLU. 2 Owing to the complex and nonlinear interaction between the L layers, when applying stochastic gradient descent to the optimization problem min W full 1 N K null k 1 n k null i 1L (f ( x k,i; W full), y k) λ 2 nullW fullnull 2 (1) 1 The softmax step is implicitly included in the loss function and we omit other operations such as max-pooling for simplicity. 2 The last-layer weights, W L, consist of K vectors that correspond to the K classes. 2 (a) 1-Layer-Peeled Model (b) 2-Layer-Peeled Model Figure 1: Illustration of Layer-Peeled Models. The right panel represents the 2-Layer-Peeled Model, which is discussed in Section 6. For each panel, we preserve the details of the white (top) box, whereas the gray (bottom) box is modeled by a simple decision variable for every training example.


Neural Collapse Under MSE Loss: Proximity to and Dynamics on the Central Path

arXiv.org Machine Learning

Recent work [Papyan, Han, and Donoho, 2020] discovered a phenomenon called Neural Collapse (NC) that occurs pervasively in today's deep net training paradigm of driving cross-entropy loss towards zero. In this phenomenon, the last-layer features collapse to their class-means, both the classifiers and class-means collapse to the same Simplex Equiangular Tight Frame (ETF), and the behavior of the last-layer classifier converges to that of the nearest-class-mean decision rule. Since then, follow-ups-such as Mixon et al. [2020] and Poggio and Liao [2020a,b]-formally analyzed this inductive bias by replacing the hard-to-study cross-entropy by the more tractable mean squared error (MSE) loss. But, these works stopped short of demonstrating the empirical reality of MSE-NC on benchmark datasets and canonical networks-as had been done in Papyan, Han, and Donoho [2020] for the cross-entropy loss. In this work, we establish the empirical reality of MSE-NC by reporting experimental observations for three prototypical networks and five canonical datasets with code for reproducing NC. Following this, we develop three main contributions inspired by MSE-NC. Firstly, we show a new theoretical decomposition of the MSE loss into (A) a term assuming the last-layer classifier is exactly the least-squares or Webb and Lowe [1990] classifier and (B) a term capturing the deviation from this least-squares classifier. Secondly, we exhibit experiments on canonical datasets and networks demonstrating that, during training, term-(B) is negligible. This motivates a new theoretical construct: the central path, where the linear classifier stays MSE-optimal-for the given feature activations-throughout the dynamics. Finally, through our study of continually renormalized gradient flow along the central path, we produce closed-form dynamics that predict full Neural Collapse in an unconstrained features model.


An Unconstrained Layer-Peeled Perspective on Neural Collapse

arXiv.org Machine Learning

Deep learning has achieved state-of-the-art performance in various applications [22], such as computer vision [18], natural language processing [4], and scientific discovery [26, 48]. Despite the empirical success of deep learning, how gradient descent or its variants lead deep neural networks to be biased towards solutions with good generalization performance on the test set is still a major open question. To develop a theoretical foundation for deep learning, many studies have investigated the implicit bias of gradient descent in different settings [24, 1, 42, 38, 28, 3]. It is well acknowledged that well-trained end-to-end deep architectures can effectively extract features relevant to a given label. Although theoretical analysis of deep learning has been successful in recent years [2, 11], most of the studies that aim to analyze the properties of the final output function fail to understand the features learned by neural networks. Recently, in [33], the authors observed that the features in the same class will collapse to their mean and the mean will converge to an equiangular tight frame (ETF) during the terminal phase of training, that is, the stage after achieving zero training error. This phenomenon, namely, neural collapse [33], provides a clear view of how the last-layer features in the neural network evolve after interpolation and enables us to understand the benefit of training after achieving zero training error to achieve better performance in terms of generalization and robustness. To theoretically analyze the neural collapse phenomenon, [7] proposed the layer-peeled model (LPM) as a simple surrogate for neural networks, where the last-layer features are modeled as free optimization variables.


A Geometric Analysis of Neural Collapse with Unconstrained Features

arXiv.org Artificial Intelligence

We provide the first global optimization landscape analysis of Neural Collapse - an intriguing empirical phenomenon that arises in the last-layer classifiers and features of neural networks during the terminal phase of training. As recently reported in [1], this phenomenon implies that (i) the class means and the last-layer classifiers all collapse to the vertices of a Simplex Equiangular Tight Frame (ETF) up to scaling, and (ii) cross-example within-class variability of last-layer activations collapses to zero. We study the problem based on a simplified unconstrained feature model, which isolates the topmost layers from the classifier of the neural network. In this context, we show that the classical cross-entropy loss with weight decay has a benign global landscape, in the sense that the only global minimizers are the Simplex ETFs while all other critical points are strict saddles whose Hessian exhibit negative curvature directions. In contrast to existing landscape analysis for deep neural networks which is often disconnected from practice, our analysis of the simplified model not only does it explain what kind of features are learned in the last layer, but it also shows why they can be efficiently optimized in the simplified settings, matching the empirical observations in practical deep network architectures. These findings could have profound implications for optimization, generalization, and robustness of broad interests. For example, our experiments demonstrate that one may set the feature dimension equal to the number of classes and fix the last-layer classifier to be a Simplex ETF for network training, which reduces memory cost by over 20% on ResNet18 without sacrificing the generalization performance. The first two authors contributed to this work equally.


Galaxy Fold screens breaking after a day: Who is to blame?

ZDNet

It didn't take long for reviewers who got their hands on early samples of Samsung's $2,000 Galaxy Fold smartphone to start reporting that their devices were experiencing catastrophic display damage along the hinge. And since the hinge that allows the display to fold is the flagship feature of the Galaxy Fold, this is a pretty big deal. Headlines such as "My Samsung Galaxy Fold screen broke after just a day" are not what Samsung needs right now. After all, Samsung is still trying to shake off the ghosts of past issues such as the Note 7 battery fires and people jamming the S-Pen into their device the wrong way. But Samsung has confirmed that the Galaxy Fold launch, scheduled for April 26, is still on.