High-dimensional limit theorems for SGD: Effective dynamics and critical scaling
–Neural Information Processing Systems
We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in the high-dimensional regime. We prove limit theorems for the trajectories of summary statistics (i.e., finite-dimensional functions) of SGD as the dimension goes to infinity. Our approach allows one to choose the summary statistics that are tracked, the initialization, and the step-size. It yields both ballistic (ODE) and diffusive (SDE) limits, with the limit depending dramatically on the former choices. We find a critical scaling regime for the step-size below which this effective dynamics" matches gradient flow for the population loss, but at which, a new correction term appears which changes the phase diagram.
Neural Information Processing Systems
Jan-18-2025, 07:37:29 GMT