AITopics | polyak

Collaborating Authors

polyak

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Momentum Provably Improves Error Feedback!

Neural Information Processing SystemsDec-27-2025, 04:38:57 GMT

Due to the high communication overhead when training machine learning models in a distributed environment, modern algorithms invariably rely on lossy communication compression. However, when untreated, the errors caused by compression propagate, and can lead to severely unstable behavior, including exponential divergence. Almost a decade ago, Seide et al. [2014] proposed an error feedback (EF) mechanism, which we refer to as EF14, as an immensely effective heuristic for mitigating this issue. However, despite steady algorithmic and theoretical advances in the EF field in the last decade, our understanding is far from complete. In this work we address one of the most pressing issues.

electronic proceedings, error feedback, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.96)

Add feedback

Sparse Polyak with optimal thresholding operators for high-dimensional M-estimation

Qiao, Tianqi, Maros, Marie

arXiv.org Machine LearningNov-25-2025

We propose and analyze a variant of Sparse Polyak for high dimensional M-estimation problems. Sparse Polyak proposes a novel adaptive step-size rule tailored to suitably estimate the problem's curvature in the high-dimensional setting, guaranteeing that the algorithm's performance does not deteriorate when the ambient dimension increases. However, convergence guarantees can only be obtained by sacrificing solution sparsity and statistical accuracy. In this work, we introduce a variant of Sparse Polyak that retains its desirable scaling properties with respect to the ambient dimension while obtaining sparser and more accurate solutions.

assumption 1, operator, theorem 1, (13 more...)

arXiv.org Machine Learning

2511.18167

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.30)

Add feedback

2f4059ce1227f021edc5d9c6f0f17dc1-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 11:32:10 GMT

artificial intelligence, experiment, machine learning, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.33)

Add feedback

Sparse Polyak: an adaptive step size rule for high-dimensional M-estimation

Qiao, Tianqi, Maros, Marie

arXiv.org Machine LearningSep-15-2025

We propose and study Sparse Polyak, a variant of Polyak's adaptive step size, designed to solve high-dimensional statistical estimation problems where the problem dimension is allowed to grow much faster than the sample size. In such settings, the standard Polyak step size performs poorly, requiring an increasing number of iterations to achieve optimal statistical precision-even when, the problem remains well conditioned and/or the achievable precision itself does not degrade with problem size. We trace this limitation to a mismatch in how smoothness is measured: in high dimensions, it is no longer effective to estimate the Lipschitz smoothness constant. Instead, it is more appropriate to estimate the smoothness restricted to specific directions relevant to the problem (restricted Lipschitz smoothness constant). Sparse Polyak overcomes this issue by modifying the step size to estimate the restricted Lipschitz smoothness constant. We support our approach with both theoretical analysis and numerical experiments, demonstrating its improved performance.

algorithm 1, convergence, theorem 1, (15 more...)

arXiv.org Machine Learning

2509.09802

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.30)

Add feedback

Entropic Mirror Descent for Linear Systems: Polyak's Stepsize and Implicit Bias

Malitsky, Yura, Posch, Alexander

arXiv.org Machine LearningMay-6-2025

This paper focuses on applying entropic mirror descent to solve linear systems, where the main challenge for the convergence analysis stems from the unboundedness of the domain. To overcome this without imposing restrictive assumptions, we introduce a variant of Polyak-type stepsizes. Along the way, we strengthen the bound for $\ell_1$-norm implicit bias, obtain sublinear and linear convergence results, and generalize the convergence result to arbitrary convex $L$-smooth functions. We also propose an alternative method that avoids exponentiation, resembling the original Hadamard descent, but with provable convergence.

artificial intelligence, descent, machine learning, (16 more...)

arXiv.org Machine Learning

2505.02614

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reviews: The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares

Neural Information Processing SystemsJan-22-2025, 15:34:18 GMT

Originality: I'm not an expert on this subfield, but as far as I know the work is original. Quality: I think this is a very nice paper. The results are interesting and clearly stated. And the work addresses an important question about the difference between theoretical learning rate schedules / averaging schemes and those used in practice. I have a couple of major comments: 1) organisation of the paper.

experiment, geometrically decaying learning rate procedure, step decay schedule, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.62)

Add feedback

Momentum Provably Improves Error Feedback!

Neural Information Processing SystemsJan-20-2025, 02:18:26 GMT

algorithm, compression, error feedback, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Improving Stochastic Cubic Newton with Momentum

Chayti, El Mahdi, Doikov, Nikita, Jaggi, Martin

arXiv.org Artificial IntelligenceOct-25-2024

We study stochastic second-order methods for solving general non-convex optimization problems. We propose using a special version of momentum to stabilize the stochastic gradient and Hessian estimates in Newton's method. We show that momentum provably improves the variance of stochastic estimates and allows the method to converge for any noise level. Using the cubic regularization technique, we prove a global convergence rate for our method on general non-convex problems to a second-order stationary point, even when using only a single stochastic data sample per iteration. This starkly contrasts with all existing stochastic second-order methods for non-convex problems, which typically require large batches. Therefore, we are the first to demonstrate global convergence for batches of arbitrary size in the non-convex case for the Stochastic Cubic Newton. Additionally, we show improved speed on convex stochastic problems for our regularized Newton methods with momentum.

artificial intelligence, machine learning, momentum, (17 more...)

arXiv.org Artificial Intelligence

2410.19644

Country:

North America > United States (0.14)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Polyak's Heavy Ball Method Achieves Accelerated Local Rate of Convergence under Polyak-Lojasiewicz Inequality

Kassing, Sebastian, Weissmann, Simon

arXiv.org Artificial IntelligenceOct-22-2024

In this work, we consider the convergence of Polyak's heavy ball method, both in continuous and discrete time, on a non-convex objective function. We recover the convergence rates derived in [Pol64] for strongly convex objective functions, assuming only validity of the Polyak-Lojasiewicz inequality. In continuous time our result holds for all initializations, whereas in the discrete time setting we conduct a local analysis around the global minima. Our results demonstrate that the heavy ball method does, in fact, accelerate on the class of objective functions satisfying the Polyak-Lojasiewicz inequality. This holds even in the discrete time setting, provided the method reaches a neighborhood of the global minima. Instead of the usually employed Lyapunov-type arguments, our approach leverages a new differential geometric perspective of the Polyak-Lojasiewicz inequality proposed in [RB24].

artificial intelligence, convergence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2410.16849

Country:

North America > United States (0.04)
Europe > Montenegro (0.04)
Europe > Germany > Saxony > Dresden (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.74)

Industry: Leisure & Entertainment > Sports > Tennis (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reviews: Integration Methods and Optimization Algorithms

Neural Information Processing SystemsOct-8-2024, 07:56:40 GMT

The paper provides an interpretation of a number of accelerated gradient algorithms (and other convex optimization algorithms) based on the theory of numerical integration and numerical solution of ODEs. In particular, the paper focuses on the well-studied class of multi-step methods to interpret Nesterov's method and Polyak's heavy ball method as discretizations of the natural gradient flow dynamics. The authors argue that this interpretation is more beneficial than existing interpretations based on different dynamics (e.g. Notice that the novelty here lies in the focus on multistep methods, as it was already well-known that accelerated algorithms can be obtained by appropriately applying Euler's discretization to certain dynamics (again, see Krichene et al). A large part of the paper is devoted to introducing important properties of numerical discretization schemes for ODEs, including consistency and zero-stability, and their instantiation in the case of multistep methods.

algorithm, interpretation, multistep method, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.65)

Add feedback