directional convergence
Faster Directional Convergence of Linear Neural Networks under Spherically Symmetric Data
In this paper, we study gradient methods for training deep linear neural networks with binary cross-entropy loss. In particular, we show global directional convergence guarantees from a polynomial rate to a linear rate for (deep) linear networks with spherically symmetric data distribution, which can be viewed as a specific zero-margin dataset. Our results do not require the assumptions in other works such as small initial loss, presumed convergence of weight direction, or overparameterization. We also characterize our findings in experiments.
Directional convergence and alignment in deep learning
The above theories, with finite width networks, usually require the weights to stay close to initialization in certain norms. By contrast, practitioners run their optimization methods as long as their computational budget allows [Shallue et al., 2018], and if the data can be perfectly classified, the
Directional Convergence, Benign Overfitting of Gradient Descent in leaky ReLU two-layer Neural Networks
In this paper, we prove directional convergence of network parameters of fixed width leaky ReLU two-layer neural networks optimized by gradient descent with exponential loss, which was previously only known for gradient flow. By a careful analysis of the convergent direction, we establish sufficient conditions of benign overfitting and discover a new phase transition in the test error bound. All of these results hold beyond the nearly orthogonal data setting which was studied in prior works. As an application, we demonstrate that benign overfitting occurs with high probability in sub-Gaussian mixture models.
Review for NeurIPS paper: Directional convergence and alignment in deep learning
Weaknesses: I have two main critiques on this work. The first relates to the significance of its results. In the setting studied, directional convergence, alignment and margin maximization have all been treated in several recent works (which the paper refers to). I know that at least in some of these works directional convergence and/or alignments were assumed (not proven), but nonetheless, my feeling is that the paper does not draw a sufficiently clear line separating itself from existing literature. For example, a very relevant existing work --- Lyu and Li 2019 --- is said to have left open the issues of directional convergence and alignment, but to my knowledge, that work does establish directional convergence, at least in some settings.