directional convergence
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Japan (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Japan (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Japan (0.04)
Directional convergence and alignment in deep learning
The above theories, with finite width networks, usually require the weights to stay close to initialization in certain norms. By contrast, practitioners run their optimization methods as long as their computational budget allows [Shallue et al., 2018], and if the data can be perfectly classified, the
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Japan (0.04)
Directional Convergence, Benign Overfitting of Gradient Descent in leaky ReLU two-layer Neural Networks
In this paper, we prove directional convergence of network parameters of fixed width leaky ReLU two-layer neural networks optimized by gradient descent with exponential loss, which was previously only known for gradient flow. By a careful analysis of the convergent direction, we establish sufficient conditions of benign overfitting and discover a new phase transition in the test error bound. All of these results hold beyond the nearly orthogonal data setting which was studied in prior works. As an application, we demonstrate that benign overfitting occurs with high probability in sub-Gaussian mixture models.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Review for NeurIPS paper: Directional convergence and alignment in deep learning
Weaknesses: I have two main critiques on this work. The first relates to the significance of its results. In the setting studied, directional convergence, alignment and margin maximization have all been treated in several recent works (which the paper refers to). I know that at least in some of these works directional convergence and/or alignments were assumed (not proven), but nonetheless, my feeling is that the paper does not draw a sufficiently clear line separating itself from existing literature. For example, a very relevant existing work --- Lyu and Li 2019 --- is said to have left open the issues of directional convergence and alignment, but to my knowledge, that work does establish directional convergence, at least in some settings.