Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation

Mishkin, Aaron, Pilanci, Mert, Schmidt, Mark

Apr-2-2024–arXiv.org Artificial Intelligence

A continuing trend in machine learning is the adoption of powerful prediction models which can exactly fit, or interpolate, their training data (Zhang et al., 2017). Methods such as over-parameterized neural networks (Zhang and Yin, 2013; Belkin et al., 2019a), kernel machines (Belkin et al., 2019b), and boosting (Schapire et al., 1997) have all been shown to achieve zero training loss in practice. This phenomena is particularly prevalent in modern deep learning, where interpolation is conjectured to be key to both optimization (Liu et al., 2022; Oymak and Soltanolkotabi, 2019) and generalization (Belkin, 2021). Recent experimental and theoretical evidence shows stochastic gradient descent(SGD) matches the fast convergence rates of deterministic gradient methods up to problemdependent constants when training interpolating models (Arora et al., 2018; Ma et al., 2018; Zou and Gu, 2019). With additional assumptions, interpolation also implies the strong (Polyak, 1987) and weak (Bassily et al., 2018; Vaswani et al., 2019) growth conditions, which bound the second moment of the stochastic gradients. Under strong/weak growth, variance-reduced algorithms typically exhibit slower convergence than stochastic gradient methods despite using more computation or memory (Defazio and Bottou, 2019; Ma et al., 2018), perhaps because these conditions already imply a form of "automatic variance reduction" (Liu et al., 2022). A combination of interpolation and growth conditions has been used to prove fast convergence rates for SGD with line-search (Vaswani et al., 2019), with the stochastic Polyak step-size (Loizou et al., 2020; Berrada et al., 2020), for mirror descent (D'Orazio et al., 2021), and for model-based methods (Asi and Duchi, 2019).

artificial intelligence, machine learning, stochastic agd, (16 more...)

arXiv.org Artificial Intelligence

Apr-2-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.28)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found