global minima
Technology:
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Country:
- Asia > China > Beijing > Beijing (0.05)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Country:
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Country:
- North America > United States > Illinois > Cook County > Chicago (0.05)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Asia > Middle East > Jordan (0.04)
Technology:
Country:
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Country:
- Asia > China > Beijing > Beijing (0.05)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
How SGD Selects the Global Minima in Over-parameterized Learning: A Dynamical Stability Perspective
The question of which global minima are accessible by a stochastic gradient decent (SGD) algorithm with specific learning rate and batch size is studied from the perspective of dynamical stability. The concept of non-uniformity is introduced, which, together with sharpness, characterizes the stability property of a global minimum and hence the accessibility of a particular SGD algorithm to that global minimum. In particular, this analysis shows that learning rate and batch size play different roles in minima selection. Extensive empirical results seem to correlate well with the theoretical findings and provide further support to these claims.