Reviews: Divergence-Augmented Policy Optimization

Jan-27-2025, 02:21:40 GMT–Neural Information Processing Systems

This paper considers model-free discrete-action reinforcement learning, with the agent learning with variants of stochastic Policy Gradient. The paper introduces and discusses the Bregman Divergence, then presents how it can be used to build a policy loss that allows stable and efficient learning. The core idea of the paper, that I found is best shown by Equation 7, is to optimize the policy by simultaneously minimizing the change between pi_t and pi_t 1 and following the policy gradient. The main contribution of the paper is the use of the Bregman Divergence for the "minimizing change between pi_t and pi_t 1" part of the algorithm. The paper is well-written and interesting to read.

bregman divergence, divergence-augmented policy optimization, gradient, (7 more...)

Neural Information Processing Systems

Jan-27-2025, 02:21:40 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (0.39)