Label Noise SGD Provably Prefers Flat Global Minimizers
–Neural Information Processing Systems
In overparametrized models, the noise in stochastic gradient descent (SGD) implicitly regularizes the optimization trajectory and determines which local minimum SGD converges to.
Neural Information Processing Systems
Aug-18-2025, 07:47:47 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- North America
- Canada > Ontario
- Toronto (0.04)
- United States > California
- Santa Clara County > Palo Alto (0.04)
- Canada > Ontario
- Asia > Middle East
- Genre:
- Research Report (0.46)
- Technology: