High-dimensional limit theorems for SGD: Momentum and Adaptive Step-sizes
Jagannath, Aukosh, Jones-McCormick, Taj, Sarangian, Varnan
We develop a high-dimensional scaling limit for Stochastic Gradient Descent with Polyak Momentum (SGD-M) and adaptive step-sizes. This provides a framework to rigourously compare online SGD with some of its popular variants. We show that the scaling limits of SGD-M coincide with those of online SGD after an appropriate time rescaling and a specific choice of step-size. However, if the step-size is kept the same between the two algorithms, SGD-M will amplify high-dimensional effects, potentially degrading performance relative to online SGD. We demonstrate our framework on two popular learning problems: Spiked Tensor PCA and Single Index Models. In both cases, we also examine online SGD with an adaptive step-size based on normalized gradients. In the high-dimensional regime, this algorithm yields multiple benefits: its dynamics admit fixed points closer to the population minimum and widens the range of admissible step-sizes for which the iterates converge to such solutions. These examples provide a rigorous account, aligning with empirical motivation, of how early preconditioners can stabilize and improve dynamics in settings where online SGD fails.
Nov-7-2025
- Country:
- North America > Canada
- Ontario > Waterloo Region > Waterloo (0.04)
- Europe
- Russia (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Czechia > South Moravian Region
- Brno (0.04)
- Asia
- Russia (0.04)
- Middle East > Jordan (0.04)
- Africa > Middle East
- Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- North America > Canada
- Genre:
- Research Report (0.50)
- Industry:
- Education (0.48)
- Technology: