Reviews: Divergence-Augmented Policy Optimization
–Neural Information Processing Systems
This paper considers model-free discrete-action reinforcement learning, with the agent learning with variants of stochastic Policy Gradient. The paper introduces and discusses the Bregman Divergence, then presents how it can be used to build a policy loss that allows stable and efficient learning. The core idea of the paper, that I found is best shown by Equation 7, is to optimize the policy by simultaneously minimizing the change between pi_t and pi_t 1 and following the policy gradient. The main contribution of the paper is the use of the Bregman Divergence for the "minimizing change between pi_t and pi_t 1" part of the algorithm. The paper is well-written and interesting to read.
Neural Information Processing Systems
Jan-27-2025, 02:21:40 GMT
- Technology: