Goto

Collaborating Authors

 polyak stepsize


Dynamics of SGD with Stochastic Polyak Stepsizes: Truly Adaptive Variants and Convergence to Exact Solution

Neural Information Processing Systems

The proposed SPS comes with strong convergence guarantees and competitive performance; however, it has two main drawbacks when it is used in non-over-parameterized regimes: (i) It requires a priori knowledge of the optimal mini-batch losses, which are not available when the interpolation condition is not satisfied (e.g., regularized objectives), and (ii) it guarantees convergence only to a neighborhood of the solution. In this work, we study the dynamics and the convergence properties of SGD equipped with new variants of the stochastic Polyak stepsize and provide solutions to both drawbacks of the original SPS. We first show that a simple modification of the original SPS that uses lower bounds instead of the optimal function values can directly solve issue (i). On the other hand, solving issue (ii) turns out to be more challenging and leads us to valuable insights into the method's behavior. We show that if interpolation is not satisfied, the correlation between SPS and stochastic gradients introduces a bias, which effectively distorts the expectation of the gradient signal near minimizers, leading to non-convergence - even if the stepsize is scaled down during training. To fix this issue, we propose DecSPS, a novel modification of SPS, which guarantees convergence to the exact minimizer - without a priori knowledge of the problem parameters. For strongly-convex optimization problems, DecSPS is the first stochastic adaptive optimization method that converges to the exact solution without restrictive assumptions like bounded iterates/gradients.


Parameter-free Clipped Gradient Descent Meets Polyak Y uki T akezawa

Neural Information Processing Systems

Gradient descent and its variants are de facto standard algorithms for training machine learning models. As gradient descent is sensitive to its hyperparame-ters, we need to tune the hyperparameters carefully using a grid search.




Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

Neural Information Processing Systems

The recently proposed stochastic Polyak stepsize (SPS) and stochastic line-search (SLS) for SGD have shown remarkable effectiveness when training over-parameterized models. However, two issues remain unsolved in this line of work. First, in non-interpolation settings, both algorithms only guarantee convergence to a neighborhood of a solution which may result in a worse output than the initial guess. While artificially decreasing the adaptive stepsize has been proposed to address this issue (Orvieto et al.), this approach results in slower convergence rates under interpolation. Second, intuitive line-search methods equipped with variance-reduction (VR) fail to converge (Dubois-Taine et al.).


Parameter-free Clipped Gradient Descent Meets Polyak Y uki T akezawa

Neural Information Processing Systems

Gradient descent and its variants are de facto standard algorithms for training machine learning models. As gradient descent is sensitive to its hyperparame-ters, we need to tune the hyperparameters carefully using a grid search.




Preconditioned subgradient method for composite optimization: overparameterization and fast convergence

arXiv.org Artificial Intelligence

Composite optimization problems involve minimizing the composition of a smooth map with a convex function. Such objectives arise in numerous data science and signal processing applications, including phase retrieval, blind deconvolution, and collaborative filtering. The subgradient method achieves local linear convergence when the composite loss is well-conditioned. However, if the smooth map is, in a certain sense, ill-conditioned or overparameterized, the subgradient method exhibits much slower sublinear convergence even when the convex function is well-conditioned. To overcome this limitation, we introduce a Levenberg-Morrison-Marquardt subgradient method that converges linearly under mild regularity conditions at a rate determined solely by the convex function. Further, we demonstrate that these regularity conditions hold for several problems of practical interest, including square-variable formulations, matrix sensing, and tensor factorization. Numerical experiments illustrate the benefits of our method.


Adaptive SGD with Polyak stepsize and Line-search: Robust Convergence and Variance Reduction

Neural Information Processing Systems

The recently proposed stochastic Polyak stepsize (SPS) and stochastic line-search (SLS) for SGD have shown remarkable effectiveness when training over-parameterized models. However, two issues remain unsolved in this line of work. First, in non-interpolation settings, both algorithms only guarantee convergence to a neighborhood of a solution which may result in a worse output than the initial guess. While artificially decreasing the adaptive stepsize has been proposed to address this issue (Orvieto et al.), this approach results in slower convergence rates under interpolation. Second, intuitive line-search methods equipped with variance-reduction (VR) fail to converge (Dubois-Taine et al.).