Label Noise SGD Provably Prefers Flat Global Minimizers
–Neural Information Processing Systems
In overparametrized models, the noise in stochastic gradient descent (SGD) implicitly regularizes the optimization trajectory and determines which local minimum SGD converges to.
Neural Information Processing Systems
Aug-18-2025, 07:47:47 GMT
- Country:
- North America
- United States > California
- Santa Clara County > Palo Alto (0.04)
- Canada > Ontario
- Toronto (0.04)
- United States > California
- Asia > Middle East
- Jordan (0.04)
- North America
- Genre:
- Research Report (0.46)
- Technology: