AITopics | accelerated gradient method

Collaborating Authors

accelerated gradient method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improving Neural Ordinary Differential Equations with Nesterov's Accelerated Gradient Method

Neural Information Processing SystemsDec-24-2025, 00:13:19 GMT

We propose the Nesterov neural ordinary differential equations (NesterovNODEs), whose layers solve the second-order ordinary differential equations (ODEs) limit of Nesterov's accelerated gradient (NAG) method, and a generalization called GNesterovNODEs. Taking the advantage of the convergence rate $\mathcal{O}(1/k^{2})$ of the NAG scheme, GNesterovNODEs speed up training and inference by reducing the number of function evaluations (NFEs) needed to solve the ODEs. We also prove that the adjoint state of a GNesterovNODEs also satisfies a GNesterovNODEs, thus accelerating both forward and backward ODE solvers and allowing the model to be scaled up for large-scale tasks. We empirically corroborate the advantage of GNesterovNODEs on a wide range of practical applications, including point cloud separation, image classification, and sequence modeling. Compared to NODEs, GNesterovNODEs require a significantly smaller number of NFEs while achieving better accuracy across our experiments.

accelerated gradient method, nesterov, neural ordinary differential equation, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.41)

Add feedback

An Accelerated Gradient Method for Convex Smooth Simple Bilevel Optimization

Neural Information Processing SystemsMay-27-2025, 01:04:19 GMT

In this paper, we focus on simple bilevel optimization problems, where we minimize a convex smooth objective function over the optimal solution set of another convex smooth constrained optimization problem. We present a novel bilevel optimization method that locally approximates the solution set of the lower-level problem using a cutting plane approach and employs an accelerated gradient-based update to reduce the upper-level objective function over the approximated solution set. We measure the performance of our method in terms of suboptimality and infeasibility errors and provide non-asymptotic convergence guarantees for both error criteria. Specifically, when the feasible set is compact, we show that our method requires at most \mathcal{O}(\max\\{1/\sqrt{\epsilon_{f}}, 1/\epsilon_g\\}) iterations to find a solution that is \epsilon_f -suboptimal and \epsilon_g -infeasible. Moreover, under the additional assumption that the lower-level objective satisfies the r -th Hölderian error bound, we show that our method achieves an iteration complexity of \mathcal{O}(\max\\{\epsilon_{f} {-\frac{2r-1}{2r}},\epsilon_{g} {-\frac{2r-1}{2r}}\\}), which matches the optimal complexity of single-level convex constrained optimization when r 1 .

artificial intelligence, convex smooth simple bilevel optimization, optimization problem, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

A Concise Lyapunov Analysis of Nesterov's Accelerated Gradient Method

Liu, Jun

arXiv.org Artificial IntelligenceFeb-25-2025

Among them, Nesterov's accelerated gradient method [7,8] has gained significant attention due to its provable acceleration on general convex functions beyond quadratics. A special focus has been on using dynamical system tools [12,10,3,14] and control-theoretical methods [5,9] for the analysis and design of such algorithms. In the standard textbook [8] by Nesterov, the convergence analysis of accelerated gradient methods is conducted using a technique known as estimating sequences. These are essentially auxiliary comparison functions used to prove the convergence rates of optimization algorithms. As pointed out in [14], estimating sequences are usually constructed inductively and can be difficult to understand and apply. This motivated the Lyapunov analysis in [14], which aims to unify the analysis of a broad class of accelerated algorithms. Despite this comprehensive work, to the best knowledge of the author, a simple and direct Lyapunov analysis of the original scheme of Nesterov's accelerated gradient method is still lacking.

gradient method, lyapunov analysis, nesterov, (12 more...)

arXiv.org Artificial Intelligence

2502.17373

Country:

North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.36)

Add feedback

Improving Neural Ordinary Differential Equations with Nesterov's Accelerated Gradient Method

Neural Information Processing SystemsOct-10-2024, 14:10:26 GMT

We propose the Nesterov neural ordinary differential equations (NesterovNODEs), whose layers solve the second-order ordinary differential equations (ODEs) limit of Nesterov's accelerated gradient (NAG) method, and a generalization called GNesterovNODEs. Taking the advantage of the convergence rate \mathcal{O}(1/k {2}) of the NAG scheme, GNesterovNODEs speed up training and inference by reducing the number of function evaluations (NFEs) needed to solve the ODEs. We also prove that the adjoint state of a GNesterovNODEs also satisfies a GNesterovNODEs, thus accelerating both forward and backward ODE solvers and allowing the model to be scaled up for large-scale tasks. We empirically corroborate the advantage of GNesterovNODEs on a wide range of practical applications, including point cloud separation, image classification, and sequence modeling. Compared to NODEs, GNesterovNODEs require a significantly smaller number of NFEs while achieving better accuracy across our experiments.

accelerated gradient method, gnesterovnode, neural ordinary differential equation, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.45)

Add feedback

Better Mini-Batch Algorithms via Accelerated Gradient Methods

Neural Information Processing SystemsMar-15-2024, 09:02:56 GMT

Mini-batch algorithms have been proposed as a way to speed-up stochastic convex optimization problems. We study how such algorithms can be improved using accelerated gradient methods. We provide a novel analysis, which shows how standard gradient methods may sometimes be insufficient to obtain a significant speed-up and propose a novel accelerated gradient algorithm, which deals with this deficiency, enjoys a uniformly superior guarantee and works well in practice.

ag algorithm, algorithm, sgd algorithm, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

Accelerated Gradient Methods for Stochastic Optimization and Online Learning

Neural Information Processing SystemsFeb-16-2024, 12:05:10 GMT

Regularized risk minimization often involves non-smooth optimization, either because of the loss function (e.g., hinge loss) or the regularizer (e.g., \ell_1 -regularizer). Gradient descent methods, though highly scalable and easy to implement, are known to converge slowly on these problems. In this paper, we develop novel accelerated gradient methods for stochastic optimization while still preserving their computational simplicity and scalability. The proposed algorithm, called SAGE (Stochastic Accelerated GradiEnt), exhibits fast convergence rates on stochastic optimization with both convex and strongly convex objectives. Experimental results show that SAGE is faster than recent (sub)gradient methods including FOLOS, SMIDAS and SCD.

accelerated gradient method, algorithm, stochastic optimization and online learning, (1 more...)

Neural Information Processing Systems

Industry: Education > Educational Setting > Online (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Better Mini-Batch Algorithms via Accelerated Gradient Methods

Neural Information Processing SystemsFeb-16-2024, 08:52:41 GMT

Mini-batch algorithms have recently received significant attention as a way to speed-up stochastic convex optimization problems. In this paper, we study how such algorithms can be improved using accelerated gradient methods. We provide a novel analysis, which shows how standard gradient methods may sometimes be insufficient to obtain a significant speed-up. We propose a novel accelerated gradient algorithm, which deals with this deficiency, and enjoys a uniformly superior guarantee. We conclude our paper with experiments on real-world datasets, which validates our algorithm and substantiates our theoretical insights.

accelerated gradient method, better mini-batch algorithm

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.51)

Add feedback

Accelerated Gradient Methods for Stochastic Optimization and Online Learning

Hu, Chonghai, Pan, Weike, Kwok, James T.

Neural Information Processing SystemsFeb-15-2020, 01:58:22 GMT

Regularized risk minimization often involves non-smooth optimization, either because of the loss function (e.g., hinge loss) or the regularizer (e.g., $\ell_1$-regularizer). Gradient descent methods, though highly scalable and easy to implement, are known to converge slowly on these problems. In this paper, we develop novel accelerated gradient methods for stochastic optimization while still preserving their computational simplicity and scalability. The proposed algorithm, called SAGE (Stochastic Accelerated GradiEnt), exhibits fast convergence rates on stochastic optimization with both convex and strongly convex objectives. Experimental results show that SAGE is faster than recent (sub)gradient methods including FOLOS, SMIDAS and SCD.

accelerated gradient method, algorithm, stochastic optimization and online learning, (1 more...)

Neural Information Processing Systems

Genre: Research Report (0.48)

Industry: Education > Educational Setting > Online (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Better Mini-Batch Algorithms via Accelerated Gradient Methods

Cotter, Andrew, Shamir, Ohad, Srebro, Nati, Sridharan, Karthik

Neural Information Processing SystemsFeb-14-2020, 23:12:23 GMT

accelerated gradient method, better mini-batch algorithm

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.60)

Add feedback

NEON+: Accelerated Gradient Methods for Extracting Negative Curvature for Non-Convex Optimization

Xu, Yi, Jin, Rong, Yang, Tianbao

arXiv.org Machine LearningMar-1-2018

Accelerated gradient (AG) methods are breakthroughs in convex optimization, improving the convergence rate of the gradient descent method for optimization with smooth functions. However, the analysis of AG methods for non-convex optimization is still limited. It remains an open question whether AG methods from convex optimization can accelerate the convergence of the gradient descent method for finding local minimum of non-convex optimization problems. This paper provides an affirmative answer to this question. In particular, we analyze two renowned variants of AG methods (namely Polyak's Heavy Ball method and Nesterov's Accelerated Gradient method) for extracting the negative curvature from random noise, which is central to escaping from saddle points. By leveraging the proposed AG methods for extracting the negative curvature, we present a new AG algorithm with double loops for non-convex optimization~\footnote{this is in contrast to a single-loop AG algorithm proposed in a recent manuscript~\citep{AGNON}, which directly analyzed the Nesterov's AG method for non-convex optimization and appeared online on November 29, 2017. However, we emphasize that our work is an independent work, which is inspired by our earlier work~\citep{NEON17} and is based on a different novel analysis.}, which converges to second-order stationary point $\x$ such that $\|\nabla f(\x)\|\leq \epsilon$ and $\nabla^2 f(\x)\geq -\sqrt{\epsilon} I$ with $\widetilde O(1/\epsilon^{1.75})$ iteration complexity, improving that of gradient descent method by a factor of $\epsilon^{-0.25}$ and matching the best iteration complexity of second-order Hessian-free methods for non-convex optimization.

artificial intelligence, machine learning, optimization, (13 more...)

arXiv.org Machine Learning

1712.01033

Country:

North America > United States > Iowa > Johnson County > Iowa City (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Washington > King County > Bellevue (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)

Add feedback