AITopics | nesterov

Collaborating Authors

nesterov

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Regularized Nonlinear Acceleration

Damien Scieur, Alexandre d'Aspremont, Francis Bach

Neural Information Processing SystemsApr-30-2026, 22:39:10 GMT

We describe a convergence acceleration technique for generic optimization problems. Our scheme computes estimates of the optimum from a nonlinear average of the iterates produced by any optimization method. The weights in this average are computed via a simple and small linear system, whose solution can be updated online. This acceleration scheme runs in parallel to the base algorithm, providing improved estimates of the solution on the fly, while the original optimization method is running. Numerical experiments are detailed on classical classification problems.

algorithm, artificial intelligence, optimization problem, (16 more...)

Neural Information Processing Systems

Country: Europe > France (0.15)

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Inference of Online Newton Methods with Nesterov's Accelerated Sketching

Wang, Haoxuan, Du, Xinchen, Na, Sen

arXiv.org Machine LearningApr-28-2026

Reliable decision-making with streaming data requires principled uncertainty quantification of online methods. While first-order methods enable efficient iterate updates, their inference procedures still require updating proper (covariance) matrices, incurring $O(d^2)$ time and memory complexity, and are sensitive to ill-conditioning and noise heterogeneity of the problem. This costly inference task offers an opportunity for more robust second-order methods, which are, however, bottlenecked by solving Newton systems with $O(d^3)$ complexity. In this paper, we address this gap by studying an online Newton method with Hessian averaging, where the Newton direction at each step is approximately computed using a sketch-and-project solver with Nesterov's acceleration, matching $O(d^2)$ complexity of first-order methods. For the proposed method, we quantify its uncertainty arising from both random data and randomized computation. Under standard smoothness and moment conditions, we establish global almost-sure convergence, prove asymptotic normality of the last iterate with a limiting covariance characterized by a Lyapunov equation, and develop a fully online covariance estimator with non-asymptotic convergence guarantees. We also connect the resulting uncertainty quantification to that of exact and sketched Newton methods without Nesterov's acceleration. Extensive experiments on regression models demonstrate the superiority of the proposed method for online inference.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Machine Learning

2604.23436

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)

Add feedback

Towards Gradient-based Bilevel Optimization with Non-convex Followers and Beyond Risheng Liu1,2 Yaohua Liu1 Shangzhi Zeng3 Jin Zhang 4,5

Neural Information Processing SystemsApr-25-2026, 17:44:58 GMT

In recent years, Bi-Level Optimization (BLO) techniques have received extensive attentions from both learning and vision communities. A variety of BLO models in complex and practical tasks are of non-convex follower structure in nature (a.k.a., without Lower-Level Convexity, LLC for short). However, this challenging class of BLOs is lack of developments on both efficient solution strategies and solid theoretical guarantees. In this work, we propose a new algorithmic framework, named Initialization Auxiliary and Pessimistic Trajectory Truncated Gradient Method (IAPTT-GM), to partially address the above issues. In particular, by introducing an auxiliary as initialization to guide the optimization dynamics and designing a pessimistic trajectory truncation operation, we construct a reliable approximate version of the original BLO in the absence of LLC hypothesis. Our theoretical investigations establish the convergence of solutions returned by IAPTT-GM towards those of the original BLO without LLC. As an additional bonus, we also theoretically justify the quality of our IAPTT-GM embedded with Nesterov's accelerated dynamics under LLC. The experimental results confirm both the convergence of our algorithm without LLC, and the theoretical findings under LLC.

artificial intelligence, machine learning, optimization problem, (16 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Algorithmic Instabilities ofAccelerated Gradient Descent

Neural Information Processing SystemsApr-24-2026, 14:16:25 GMT

We disprove this conjecture and show, for two notions of algorithmic stability (including uniform stability), that the stability of Nesterov's accelerated method in fact deteriorates exponentially fast with the number of gradient steps.

artificial intelligence, machine learning, stability, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.65)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

AVariational Perspective on High-Resolution ODEs

Neural Information Processing SystemsApr-24-2026, 07:54:10 GMT

We consider unconstrained minimization of smooth convex functions. We propose a novel variational perspective using forced Euler-Lagrange equation that allows for studying high-resolution ODEs. Through this, we obtain a faster convergence rate for gradient norm minimization using Nesterov's accelerated gradient method. Additionally, we show that Nesterov's method can be interpreted as a ratematching discretization of an appropriately chosen high-resolution ODE. Finally, using the results from the new variational perspective, we propose a stochastic method for noisy gradients.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia (0.47)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than $O(1/\epsilon)$

Yi Xu, Yan Yan, Qihang Lin, Tianbao Yang

Neural Information Processing SystemsApr-21-2026, 22:10:40 GMT

In this paper, we develop a novel homotopy smoothing (HOPS) algorithm for solving a family of non-smooth problems that is composed of a non-smooth term with an explicit max-structure and a smooth term or a simple non-smooth term whose proximal mapping is easy to compute. The best known iteration complexity for solving such non-smooth optimization problems is O(1/) without any assumption on the strong convexity. In this work, we will show that the proposed HOPS achieved a lower iteration complexity of O(1/1 θ) 1with θ (0,1] capturing the local sharpness of the objective function around the optimal solutions. To the best of our knowledge, this is the lowest iteration complexity achieved so far for the considered non-smooth optimization problems without strong convexity assumption. The HOPS algorithm employs Nesterov's smoothing technique and Nesterov's accelerated gradient method and runs in stages, which gradually decreases the smoothing parameter in a stage-wise manner until it yields a sufficiently good approximation of the original function. We show that HOPS enjoys a linear convergence for many well-known non-smooth problems (e.g., empirical risk minimization with a piece-wise linear loss function and `1 norm regularizer, finding a point in a polyhedron, cone programming, etc). Experimental results verify the effectiveness of HOPS in comparison with Nesterov's smoothing algorithm and the primal-dual style of first-order methods.

artificial intelligence, machine learning, optimization problem, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Iowa > Johnson County > Iowa City (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)

Add feedback

Designing smoothing functions for improved worst-case competitive ratio in online optimization

Reza Eghbali, Maryam Fazel

Neural Information Processing SystemsMar-23-2026, 06:54:19 GMT

Online optimization covers problems such as online resource allocation, online bipartite matching, adwords (a central problem in e-commerce and advertising), and adwords with separable concave returns. We analyze the worst case competitive ratio of two primal-dual algorithms for a class of online convex (conic) optimization problems that contains the previous examples as special cases defined on the positive orthant.

algorithm, artificial intelligence, optimization problem, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Washington > King County > Seattle (0.14)

Industry: Information Technology > Services > e-Commerce Services (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.90)

Add feedback

Adaptive Averaging in Accelerated Descent Dynamics

Walid Krichene, Alexandre Bayen, Peter L. Bartlett

Neural Information Processing SystemsMar-23-2026, 02:33:53 GMT

We study accelerated descent dynamics for constrained convex optimization. This dynamics can be described naturally as a coupling of a dual variable accumulating gradients at a given rate η(t), and a primal variable obtained as the weighted average of the mirrored dual trajectory, with weights w(t). Using a Lyapunov argument, we give sufficient conditions on η and wto achieve a desired convergence rate. As an example, we show that the replicator dynamics (an example of mirror descent on the simplex) can be accelerated using a simple averaging scheme. We then propose an adaptive averaging heuristic which adaptively computes the weights to speed up the decrease of the Lyapunov function. We provide guarantees on adaptive averaging in continuous-time, prove that it preserves the quadratic convergence rate of accelerated first-order methods in discrete-time, and give numerical experiments to compare it with existing heuristics, such as adaptive restarting. The experiments indicate that adaptive averaging performs at least as well as adaptive restarting, with significant improvements in some cases.

artificial intelligence, machine learning, trajectory, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Add feedback

Accelerated Regularized Learning in Finite N-Person Games

Neural Information Processing SystemsMar-20-2026, 06:14:33 GMT

Motivated by the success of Nesterov's accelerated gradient algorithm for convex minimization problems, we examine whether it is possible to achieve similar performance gains in the context of online learning in games.To that end, we introduce a family of accelerated learning methods, which we call "follow the accelerated leader" (FTXL), and which incorporates the use of momentum within the general framework of regularized learning - and, in particular, the exponential / multiplicative weights algorithm and its variants.Drawing inspiration and techniques from the continuous-time analysis of Nesterov's algorithm, we show that FTXL converges locally to strict Nash equilibria at a superlinear rate, achieving in this way an exponential speed-up over vanilla regularized learning methods (which, by comparison, converge to strict equilibria at a geometric, linear rate).Importantly, the FTXL maintains its superlinear convergence rate in a broad range of feedback structures, from deterministic, full information models to stochastic, realization-based ones, and even bandit, payoff-based information, where players are only able to observe their individual realized payoffs.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.64)

Add feedback

Filters

Collaborating Authors

nesterov

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

4cf0ed8641cfcbbf46784e620a0316fb-Paper.pdf

Regularized Nonlinear Acceleration

Inference of Online Newton Methods with Nesterov's Accelerated Sketching

Towards Gradient-based Bilevel Optimization with Non-convex Followers and Beyond Risheng Liu1,2 Yaohua Liu1 Shangzhi Zeng3 Jin Zhang 4,5

Algorithmic Instabilities ofAccelerated Gradient Descent

AVariational Perspective on High-Resolution ODEs

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than $O(1/\epsilon)$

Designing smoothing functions for improved worst-case competitive ratio in online optimization

Adaptive Averaging in Accelerated Descent Dynamics

Accelerated Regularized Learning in Finite N-Person Games