AITopics | policy gradient update

Collaborating Authors

policy gradient update

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Importance of Sampling inMeta-Reinforcement Learning

Bradly Stadie, Ge Yang, Rein Houthooft, Peter Chen, Yan Duan, Yuhuai Wu, Pieter Abbeel, Ilya Sutskever

Neural Information Processing SystemsNov-20-2025, 20:13:20 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

332b4fbe322e11a71fa39d91c664d8fa-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-14-2025, 04:56:25 GMT

adaptation, agent, experiment, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

$TAR^2$: Temporal-Agent Reward Redistribution for Optimal Policy Preservation in Multi-Agent Reinforcement Learning

Kapoor, Aditya, Tessera, Kale-ab, Baranwal, Mayank, Khadilkar, Harshad, Albrecht, Stefano, Sun, Mingfei

arXiv.org Artificial IntelligenceFeb-7-2025

In cooperative multi-agent reinforcement learning (MARL), learning effective policies is challenging when global rewards are sparse and delayed. This difficulty arises from the need to assign credit across both agents and time steps, a problem that existing methods often fail to address in episodic, long-horizon tasks. We propose Temporal-Agent Reward Redistribution $TAR^2$, a novel approach that decomposes sparse global rewards into agent-specific, time-step-specific components, thereby providing more frequent and accurate feedback for policy learning. Theoretically, we show that $TAR^2$ (i) aligns with potential-based reward shaping, preserving the same optimal policies as the original environment, and (ii) maintains policy gradient update directions identical to those under the original sparse reward, ensuring unbiased credit signals. Empirical results on two challenging benchmarks, SMACLite and Google Research Football, demonstrate that $TAR^2$ significantly stabilizes and accelerates convergence, outperforming strong baselines like AREL and STAS in both learning speed and final performance. These findings establish $TAR^2$ as a principled and practical solution for agent-temporal credit assignment in sparse-reward multi-agent systems.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2502.04864

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Agent-Temporal Credit Assignment for Optimal Policy Preservation in Sparse Multi-Agent Reinforcement Learning

Kapoor, Aditya, Swamy, Sushant, Tessera, Kale-ab, Baranwal, Mayank, Sun, Mingfei, Khadilkar, Harshad, Albrecht, Stefano V.

arXiv.org Artificial IntelligenceDec-19-2024

In multi-agent environments, agents often struggle to learn optimal policies due to sparse or delayed global rewards, particularly in long-horizon tasks where it is challenging to evaluate actions at intermediate time steps. We introduce Temporal-Agent Reward Redistribution (TAR$^2$), a novel approach designed to address the agent-temporal credit assignment problem by redistributing sparse rewards both temporally and across agents. TAR$^2$ decomposes sparse global rewards into time-step-specific rewards and calculates agent-specific contributions to these rewards. We theoretically prove that TAR$^2$ is equivalent to potential-based reward shaping, ensuring that the optimal policy remains unchanged. Empirical results demonstrate that TAR$^2$ stabilizes and accelerates the learning process. Additionally, we show that when TAR$^2$ is integrated with single-agent reinforcement learning algorithms, it performs as well as or better than traditional multi-agent reinforcement learning methods.

credit assignment, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2412.14779

Country:

Asia > India > Maharashtra > Mumbai (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:

Overview (0.88)
Research Report > New Finding (0.34)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Samples are not all useful: Denoising policy gradient updates using variance

Flet-Berliac, Yannis, Preux, Philippe

arXiv.org Machine LearningApr-10-2019

Policy gradient algorithms in reinforcement learning rely on efficiently sampling an environment. Most sampling procedures are based solely on sampling the agent's policy. However, other measures made available through these algorithms could be used in order to improve the sampling prior to each policy update. Following this line of thoughts, we propose a method where a transition is used in the gradient update if it meets a particular criterion, and rejected otherwise. This criterion is the fraction of variance explained ($\mathcal{V}^{ex}$), a measure of the discrepancy between a model and actual samples. $\mathcal{V}^{ex}$ can be used to evaluate the impact each transition will have on the learning. This criterion refines sampling and improves the policy gradient algorithm. In this paper: (1) We introduce and explore $\mathcal{V}^{ex}$, the selection criterion used to improve the sampling procedure. (2) We conduct experiments across a variety of standard benchmark environments, including continuous control problems. Our results show better performance than if we did not use the $\mathcal{V}^{ex}$ criterion for the policy gradient update. (3) We investigate why $\mathcal{V}^{ex}$ gives a good evaluation for the selection of samples that will positively impact the learning. (4) We show how this criterion can be interpreted as a dynamic way to adjust the ratio between exploration and exploitation.

gradient update, policy gradient update, value function, (13 more...)

arXiv.org Machine Learning

1904.04025

Country:

Asia > Middle East > Jordan (0.04)
Europe > France > Hauts-de-France > Pas-de-Calais (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Some Considerations on Learning to Explore via Meta-Reinforcement Learning

Stadie, Bradly C., Yang, Ge, Houthooft, Rein, Chen, Xi, Duan, Yan, Wu, Yuhuai, Abbeel, Pieter, Sutskever, Ilya

arXiv.org Artificial IntelligenceJan-11-2019

We consider the problem of exploration in meta reinforcement learning. Two new meta reinforcement learning algorithms are suggested: E-MAML and E-$\text{RL}^2$. Results are presented on a novel environment we call `Krazy World' and a set of maze environments. We show E-MAML and E-$\text{RL}^2$ deliver better performance on tasks where exploration is important.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1803.01118

Country: North America > Canada (0.46)

Genre: Research Report (0.50)

Industry: Education (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

The Importance of Sampling inMeta-Reinforcement Learning

Stadie, Bradly, Yang, Ge, Houthooft, Rein, Chen, Peter, Duan, Yan, Wu, Yuhuai, Abbeel, Pieter, Sutskever, Ilya

Neural Information Processing SystemsDec-31-2018

We interpret meta-reinforcement learning as the problem of learning how to quickly find a good sampling distribution in a new environment. This interpretation leads to the development of two new meta-reinforcement learning algorithms: E-MAML and E-$\text{RL}^2$. Results are presented on a new environment we call `Krazy World': a difficult high-dimensional gridworld which is designed to highlight the importance of correctly differentiating through sampling distributions in meta-reinforcement learning. Further results are presented on a set of maze environments. We show E-MAML and E-$\text{RL}^2$ deliver better performance than baseline algorithms on both tasks.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America > Canada (0.46)

Industry: Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Learning Neural Parsers with Deterministic Differentiable Imitation Learning

Shankar, Tanmay, Rhinehart, Nicholas, Muelling, Katharina, Kitani, Kris M.

arXiv.org Artificial IntelligenceJun-20-2018

We address the problem of spatial segmentation of a 2D object in the context of a robotic system for painting, where an optimal segmentation depends on both the appearance of the object and the size of each segment. Since each segment must take into account appearance features at several scales, we take a hierarchical grammar-based parsing approach to decompose the object into 2D segments for painting. Since there are many ways to segment an object the solution space is extremely large and it is very challenging to utilize an exploration based optimization approach like reinforcement learning. Instead, we pose the segmentation problem as an imitation learning problem by using a segmentation algorithm in the place of an expert, that has access to a small dataset with known foreground-background segmentations. During the imitation learning process, we learn to imitate the oracle (segmentation algorithm) using only the image of the object, without the use of the known foreground-background segmentations. We introduce a novel deterministic policy gradient update, DRAG, in the form of a deterministic actor-critic variant of AggreVaTeD, to train our neural network based object parser. We will also show that our approach can be seen as extending DDPG to the Imitation Learning scenario. Training our neural parser to imitate the oracle via DRAG allow our neural parser to outperform several existing imitation learning approaches.

machine learning, natural language, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

1806.07822

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Africa > Togo (0.04)

Genre: Research Report (0.51)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Fourier Policy Gradients

Fellows, Matthew, Ciosek, Kamil, Whiteson, Shimon

arXiv.org Artificial IntelligenceMay-30-2018

We propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as convolutions and turns them into multiplications. The obtained analytical solutions allow us to capture the low variance benefits of EPG in a broad range of settings. For the critic, we treat trigonometric and radial basis functions, two function families with the universal approximation property. The choice of policy can be almost arbitrary, including mixtures or hybrid continuous-discrete probability distributions. Moreover, we derive a general family of sample-based estimators for stochastic policy gradients, which unifies existing results on sample-based approximation. We believe that this technique has the potential to shape the next generation of policy gradient approaches, powered by analytical results.

fourier transform, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

1802.06891

Country: Europe > United Kingdom > England (0.68)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

DiGrad: Multi-Task Reinforcement Learning with Shared Actions

Dewangan, Parijat, Phaniteja, S, Krishna, K Madhava, Sarkar, Abhishek, Ravindran, Balaraman

arXiv.org Machine LearningFeb-27-2018

Most reinforcement learning algorithms are inefficient for learning multiple tasks in complex robotic systems, where different tasks share a set of actions. In such environments a compound policy may be learnt with shared neural network parameters, which performs multiple tasks concurrently. However such compound policy may get biased towards a task or the gradients from different tasks negate each other, making the learning unstable and sometimes less data efficient. In this paper, we propose a new approach for simultaneous training of multiple tasks sharing a set of common actions in continuous action spaces, which we call as DiGrad (Differential Policy Gradient). The proposed framework is based on differential policy gradients and can accommodate multi-task learning in a single actor-critic network. We also propose a simple heuristic in the differential policy gradient update to further improve the learning. The proposed architecture was tested on 8 link planar manipulator and 27 degrees of freedom(DoF) Humanoid for learning multi-goal reachability tasks for 3 and 2 end effectors respectively. We show that our approach supports efficient multi-task learning in complex robotic systems, outperforming related methods in continuous action spaces.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Machine Learning

1802.10463

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback