Divergence-Augmented Policy Optimization

Wang, Qing, Li, Yingru, Xiong, Jiechao, Zhang, Tong

Mar-18-2020, 23:01:41 GMT–Neural Information Processing Systems

In deep reinforcement learning, policy optimization methods need to deal with issues such as function approximation and the reuse of off-policy data. Standard policy gradient methods do not handle off-policy data well, leading to premature convergence and instability. This paper introduces a method to stabilize policy optimization when off-policy data are reused. The idea is to include a Bregman divergence between the behavior policy that generates the data and the current policy to ensure small and safe policy updates with off-policy data. The Bregman divergence is calculated between the state distributions of two policies, instead of only on the action probabilities, leading to a divergence augmentation formulation.

bregman divergence, divergence-augmented policy optimization, off-policy data, (1 more...)

Neural Information Processing Systems

Mar-18-2020, 23:01:41 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)