AITopics | stochastic gradient evaluation

Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization

Neural Information Processing SystemsMar-16-2026, 23:28:15 GMT

We present a unified framework to analyze the global convergence of Langevin dynamics based algorithms for nonconvex finite-sum optimization with $n$ component functions. At the core of our analysis is a direct analysis of the ergodicity of the numerical approximations to Langevin dynamics, which leads to faster convergence rates. Specifically, we show that gradient Langevin dynamics (GLD) and stochastic gradient Langevin dynamics (SGLD) converge to the \textit{almost minimizer}\footnote{Following \citet{raginsky2017non}, an almost minimizer is defined to be a point which is within the ball of the global minimizer with radius $O(d\log(\beta+1)/\beta)$, where $d$ is the problem dimension and $\beta$ is the inverse temperature parameter.}

artificial intelligence, langevin dynamic, machine learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.67)

Add feedback

Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization

Neural Information Processing SystemsNov-20-2025, 22:43:09 GMT

We present a unified framework to analyze the global convergence of Langevin dynamics based algorithms for nonconvex finite-sum optimization with $n$ component functions. At the core of our analysis is a direct analysis of the ergodicity of the numerical approximations to Langevin dynamics, which leads to faster convergence rates. Specifically, we show that gradient Langevin dynamics (GLD) and stochastic gradient Langevin dynamics (SGLD) converge to the \textit{almost minimizer}\footnote{Following \citet{raginsky2017non}, an almost minimizer is defined to be a point which is within the ball of the global minimizer with radius $O(d\log(\beta+1)/\beta)$, where $d$ is the problem dimension and $\beta$ is the inverse temperature parameter.}

global convergence, gradient langevin dynamic, langevin dynamic, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.67)

Add feedback

Third-order Smoothness Helps: Faster Stochastic Optimization Algorithms for Finding Local Minima

Yaodong Yu, Pan Xu, Quanquan Gu

Neural Information Processing SystemsNov-20-2025, 21:28:56 GMT

In this paper, we aim to design efficient stochastic optimization algorithms that can find an approximate local minimum of ( 1.1), i.e., an (,

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.29)
Asia > Middle East > Jordan (0.05)
North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

ecb47fbb07a752413640f82a945530f8-Supplemental.pdf

Neural Information Processing SystemsOct-9-2025, 15:40:47 GMT

artificial intelligence, inequality, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Hong Kong (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.31)

Add feedback

Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems Luo Luo 1 Haishan Y e 2 Zhichao Huang

Neural Information Processing SystemsOct-9-2025, 15:40:39 GMT

Formulation (2) contains several interesting examples in machine learning such as robust optimization [14, 46] and adversarial training [17, 40].

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Hong Kong (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.45)

Add feedback

Reviews: Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization

Neural Information Processing SystemsOct-7-2024, 05:26:17 GMT

The paper proposes a stochastic nested variance reduced gradient descent method for non-convex finite-sum optimization. It has been studied that variance reduction in stochastic gradient evaluations improves the complexity of stochastic gradient evaluations. A popular method is stochastic variance reduced gradient (SVRG), which uses a single reference point to evaluate the gradient. Inspired by this, authors introduce variance reduction using multiple reference points with nested scheme. More precisely, each reference point updates in every T steps and the proposed algorithm uses K points and hence one-epoch iterates T K loops.

nested variance reduced gradient descent, nonconvex optimization, stochastic gradient evaluation, (4 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization

Lin, Tianyi, Jin, Chi, Jordan, Michael. I.

arXiv.org Artificial IntelligenceAug-21-2024

We provide a unified analysis of two-timescale gradient descent ascent (TTGDA) for solving structured nonconvex minimax optimization problems in the form of $\min_\textbf{x} \max_{\textbf{y} \in Y} f(\textbf{x}, \textbf{y})$, where the objective function $f(\textbf{x}, \textbf{y})$ is nonconvex in $\textbf{x}$ and concave in $\textbf{y}$, and the constraint set $Y \subseteq \mathbb{R}^n$ is convex and bounded. In the convex-concave setting, the single-timescale GDA achieves strong convergence guarantees and has been used for solving application problems arising from operations research and computer science. However, it can fail to converge in more general settings. Our contribution in this paper is to design the simple deterministic and stochastic TTGDA algorithms that efficiently find one stationary point of the function $\Phi(\cdot) := \max_{\textbf{y} \in Y} f(\cdot, \textbf{y})$. Specifically, we prove the theoretical bounds on the complexity of solving both smooth and nonsmooth nonconvex-concave minimax optimization problems. To our knowledge, this is the first systematic analysis of TTGDA for nonconvex minimax optimization, shedding light on its superior performance in training generative adversarial networks (GANs) and in solving other real-world application problems.

algorithm 1, gradient evaluation, inequality, (13 more...)

arXiv.org Artificial Intelligence

2408.11974

Country:

Asia > Middle East > Jordan (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization

Li, Huan, Lin, Zhouchen, Fang, Yongchun

arXiv.org Artificial IntelligenceAug-27-2022

We study stochastic decentralized optimization for the problem of training machine learning models with large-scale distributed data. We extend the widely used EXTRA and DIGing methods with variance reduction (VR), and propose two methods: VR-EXTRA and VR-DIGing. The proposed VR-EXTRA requires the time of $O((\kappa_s+n)\log\frac{1}{\epsilon})$ stochastic gradient evaluations and $O((\kappa_b+\kappa_c)\log\frac{1}{\epsilon})$ communication rounds to reach precision $\epsilon$, which are the best complexities among the non-accelerated gradient-type methods, where $\kappa_s$ and $\kappa_b$ are the stochastic condition number and batch condition number for strongly convex and smooth problems, respectively, $\kappa_c$ is the condition number of the communication network, and $n$ is the sample size on each distributed node. The proposed VR-DIGing has a little higher communication cost of $O((\kappa_b+\kappa_c^2)\log\frac{1}{\epsilon})$. Our stochastic gradient computation complexities are the same as the ones of single-machine VR methods, such as SAG, SAGA, and SVRG, and our communication complexities keep the same as those of EXTRA and DIGing, respectively. To further speed up the convergence, we also propose the accelerated VR-EXTRA and VR-DIGing with both the optimal $O((\sqrt{n\kappa_s}+n)\log\frac{1}{\epsilon})$ stochastic gradient computation complexity and $O(\sqrt{\kappa_b\kappa_c}\log\frac{1}{\epsilon})$ communication complexity. Our stochastic gradient computation complexity is also the same as the ones of single-machine accelerated VR methods, such as Katyusha, and our communication complexity keeps the same as those of accelerated full batch decentralized methods, such as MSDA.

algorithm, complexity, log 1, (11 more...)

arXiv.org Artificial Intelligence

2009.04373

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.98)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.98)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Online Learning for Non-monotone Submodular Maximization: From Full Information to Bandit Feedback

Zhang, Qixin, Deng, Zengde, Chen, Zaiyi, Zhou, Kuangqi, Hu, Haoyuan, Yang, Yu

arXiv.org Artificial IntelligenceAug-16-2022

In this paper, we revisit the online non-monotone continuous DR-submodular maximization problem over a down-closed convex set, which finds wide real-world applications in the domain of machine learning, economics, and operations research. At first, we present the Meta-MFW algorithm achieving a $1/e$-regret of $O(\sqrt{T})$ at the cost of $T^{3/2}$ stochastic gradient evaluations per round. As far as we know, Meta-MFW is the first algorithm to obtain $1/e$-regret of $O(\sqrt{T})$ for the online non-monotone continuous DR-submodular maximization problem over a down-closed convex set. Furthermore, in sharp contrast with ODC algorithm \citep{thang2021online}, Meta-MFW relies on the simple online linear oracle without discretization, lifting, or rounding operations. Considering the practical restrictions, we then propose the Mono-MFW algorithm, which reduces the per-function stochastic gradient evaluations from $T^{3/2}$ to 1 and achieves a $1/e$-regret bound of $O(T^{4/5})$. Next, we extend Mono-MFW to the bandit setting and propose the Bandit-MFW algorithm which attains a $1/e$-regret bound of $O(T^{8/9})$. To the best of our knowledge, Mono-MFW and Bandit-MFW are the first sublinear-regret algorithms to explore the one-shot and bandit setting for online non-monotone continuous DR-submodular maximization problem over a down-closed convex set, respectively. Finally, we conduct numerical experiments on both synthetic and real-world datasets to verify the effectiveness of our methods.

algorithm, down-closed convex, maximization, (12 more...)

arXiv.org Artificial Intelligence

2208.07632

Country:

Asia > Singapore (0.04)
Asia > China > Hong Kong > Kowloon (0.04)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Online (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.40)

Add feedback

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima

Chen, Zixiang, Zhou, Dongruo, Gu, Quanquan

arXiv.org Machine LearningOct-25-2021

Escaping from saddle points and finding local minima is a central problem in nonconvex optimization. Perturbed gradient methods are perhaps the simplest approach for this problem. However, to find $(\epsilon, \sqrt{\epsilon})$-approximate local minima, the existing best stochastic gradient complexity for this type of algorithms is $\tilde O(\epsilon^{-3.5})$, which is not optimal. In this paper, we propose \texttt{Pullback}, a faster perturbed stochastic gradient framework for finding local minima. We show that Pullback with stochastic gradient estimators such as SARAH/SPIDER and STORM can find $(\epsilon, \epsilon_{H})$-approximate local minima within $\tilde O(\epsilon^{-3} + \epsilon_{H}^{-6})$ stochastic gradient evaluations (or $\tilde O(\epsilon^{-3})$ when $\epsilon_H = \sqrt{\epsilon}$). The core idea of our framework is a step-size ``pullback'' scheme to control the average movement of the iterates, which leads to faster convergence to the local minima. Experiments on matrix factorization problems corroborate our theory.

algorithm, inequality, local minima, (12 more...)

arXiv.org Machine Learning

2110.13144

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Filters

Collaborating Authors

stochastic gradient evaluation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization

Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization

Third-order Smoothness Helps: Faster Stochastic Optimization Algorithms for Finding Local Minima

ecb47fbb07a752413640f82a945530f8-Supplemental.pdf

Stochastic Recursive Gradient Descent Ascent for Stochastic Nonconvex-Strongly-Concave Minimax Problems Luo Luo 1 Haishan Y e 2 Zhichao Huang

Reviews: Stochastic Nested Variance Reduced Gradient Descent for Nonconvex Optimization

Two-Timescale Gradient Descent Ascent Algorithms for Nonconvex Minimax Optimization

Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization

Online Learning for Non-monotone Submodular Maximization: From Full Information to Bandit Feedback

Faster Perturbed Stochastic Gradient Methods for Finding Local Minima