AITopics | divergence-augmented policy optimization

Divergence-Augmented Policy Optimization

Neural Information Processing SystemsDec-25-2025, 23:37:24 GMT

In deep reinforcement learning, policy optimization methods need to deal with issues such as function approximation and the reuse of off-policy data. Standard policy gradient methods do not handle off-policy data well, leading to premature convergence and instability. This paper introduces a method to stabilize policy optimization when off-policy data are reused. The idea is to include a Bregman divergence between the behavior policy that generates the data and the current policy to ensure small and safe policy updates with off-policy data. The Bregman divergence is calculated between the state distributions of two policies, instead of only on the action probabilities, leading to a divergence augmentation formulation. Empirical experiments on Atari games show that in the data-scarce scenario where the reuse of off-policy data becomes necessary, our method can achieve better performance than other state-of-the-art deep reinforcement learning algorithms.

divergence-augmented policy optimization, name change, off-policy data, (3 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reviews: Divergence-Augmented Policy Optimization

Neural Information Processing SystemsJan-27-2025, 02:21:40 GMT

This paper considers model-free discrete-action reinforcement learning, with the agent learning with variants of stochastic Policy Gradient. The paper introduces and discusses the Bregman Divergence, then presents how it can be used to build a policy loss that allows stable and efficient learning. The core idea of the paper, that I found is best shown by Equation 7, is to optimize the policy by simultaneously minimizing the change between pi_t and pi_t 1 and following the policy gradient. The main contribution of the paper is the use of the Bregman Divergence for the "minimizing change between pi_t and pi_t 1" part of the algorithm. The paper is well-written and interesting to read.

bregman divergence, divergence-augmented policy optimization, gradient, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Divergence-Augmented Policy Optimization

Neural Information Processing SystemsOct-10-2024, 22:51:53 GMT

In deep reinforcement learning, policy optimization methods need to deal with issues such as function approximation and the reuse of off-policy data. Standard policy gradient methods do not handle off-policy data well, leading to premature convergence and instability. This paper introduces a method to stabilize policy optimization when off-policy data are reused. The idea is to include a Bregman divergence between the behavior policy that generates the data and the current policy to ensure small and safe policy updates with off-policy data. The Bregman divergence is calculated between the state distributions of two policies, instead of only on the action probabilities, leading to a divergence augmentation formulation.

bregman divergence, divergence-augmented policy optimization, off-policy data, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)

Add feedback

Divergence-Augmented Policy Optimization

Wang, Qing, Li, Yingru, Xiong, Jiechao, Zhang, Tong

Neural Information Processing SystemsMar-18-2020, 23:01:41 GMT

In deep reinforcement learning, policy optimization methods need to deal with issues such as function approximation and the reuse of off-policy data. Standard policy gradient methods do not handle off-policy data well, leading to premature convergence and instability. This paper introduces a method to stabilize policy optimization when off-policy data are reused. The idea is to include a Bregman divergence between the behavior policy that generates the data and the current policy to ensure small and safe policy updates with off-policy data. The Bregman divergence is calculated between the state distributions of two policies, instead of only on the action probabilities, leading to a divergence augmentation formulation.

bregman divergence, divergence-augmented policy optimization, off-policy data, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)

Add feedback

Filters

Collaborating Authors

divergence-augmented policy optimization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Divergence-Augmented Policy Optimization

Reviews: Divergence-Augmented Policy Optimization

Divergence-Augmented Policy Optimization

Divergence-Augmented Policy Optimization