AITopics | variance-reduced

Collaborating Authors

variance-reduced

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AnImprovedAnalysisof(Variance-Reduced) Policy GradientandNaturalPolicyGradientMethods

Neural Information Processing SystemsFeb-8-2026, 11:46:44 GMT

In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations.

artificial intelligence, arxivpreprintarxiv, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Review for NeurIPS paper: An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods

Neural Information Processing SystemsJan-24-2025, 13:09:12 GMT

Additional Feedback: Detailed comments on a section-by-section basis are given below. Lines 10-13 of abstract: I found this quite vaguely worded, and didn't understand which contributions in the main paper this is referring to. Section 1 "Policy gradients are usually estimated via Monte-Carlo rollouts". To me, this would suggest that critics are not used, or are used at most as baselines. However, as far as I am aware, most of the successful methods cited in the previous paragraph do use bootstrapping, and so are not pure Monte Carlo, in the sense that the term is used in RL.

improved analysis, natural policy gradient method, variance-reduced, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.36)

Add feedback

An Improved Analysis of (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods

Neural Information Processing SystemsOct-10-2024, 05:56:47 GMT

In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG (NPG) methods, and their variance-reduced variants, under general smooth policy parametrizations. More specifically, with the Fisher information matrix of the policy being positive definite: i) we show that a state-of-the-art variance-reduced PG method, which has only been shown to converge to stationary points, converges to the globally optimal value up to some inherent function approximation error due to policy parametrization; ii) we show that NPG enjoys a lower sample complexity; iii) we propose SRVR-NPG, which incorporates variance-reduction into the NPG update. Our improvements follow from an observation that the convergence of (variance-reduced) PG and NPG methods can improve each other: the stationary convergence analysis of PG can be applied on NPG as well, and the global convergence analysis of NPG can help to establish the global convergence of (variance-reduced) PG methods. Thanks to this improvement, we have also made variance-reduction for NPG possible for the first time, with both global convergence and an efficient finite-sample complexity.

improved analysis, natural policy gradient method, variance-reduced, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.43)

Add feedback

On the Stochastic (Variance-Reduced) Proximal Gradient Method for Regularized Expected Reward Optimization

Liang, Ling, Yang, Haizhao

arXiv.org Artificial IntelligenceJan-23-2024

We consider a regularized expected reward optimization problem in the non-oblivious setting that covers many existing problems in reinforcement learning (RL). In order to solve such an optimization problem, we apply and analyze the classical stochastic proximal gradient method. In particular, the method has shown to admit an $O(\epsilon^{-4})$ sample complexity to an $\epsilon$-stationary point, under standard conditions. Since the variance of the classical stochastic gradient estimator is typically large which slows down the convergence, we also apply an efficient stochastic variance-reduce proximal gradient method with an importance sampling based ProbAbilistic Gradient Estimator (PAGE). To the best of our knowledge, the application of this method represents a novel approach in addressing the general regularized reward optimization problem. Our analysis shows that the sample complexity can be improved from $O(\epsilon^{-4})$ to $O(\epsilon^{-3})$ under additional conditions. Our results on the stochastic (variance-reduced) proximal gradient method match the sample complexity of their most competitive counterparts under similar settings in the RL literature.

gradient method, optimization, proximal gradient method, (12 more...)

arXiv.org Artificial Intelligence

2401.12508

Country:

North America > United States > Maryland (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback