c86ff2d301940fce9357de92c5222b44-Supplemental-Conference.pdf
–Neural Information Processing Systems
Stochastic Gradient Descent (SGD) has been the method of choice for learning large-scale non-convex models. While a general analysis of when SGD works has been elusive, there has been a lot of recent progress in understanding the convergence of Gradient Flow (GF) on the population loss, partly due to the simplicity thatacontinuous-time analysis buysus.
Neural Information Processing Systems
Feb-11-2026, 21:22:48 GMT
- Country:
- Asia
- Middle East > Jordan (0.04)
- Russia (0.04)
- Europe
- France > Île-de-France
- Russia (0.04)
- Asia
- Genre:
- Research Report (0.45)
- Technology: