PACER: A Fully Push-forward-based Distributional Reinforcement Learning Algorithm
Bai, Wensong, Zhang, Chao, Fu, Yichao, Peng, Lingwei, Qian, Hui, Dai, Bin
–arXiv.org Artificial Intelligence
In this paper, we propose the first fully push-forward-based Distributional Reinforcement Learning algorithm, called Push-forward-based Actor-Critic EncourageR (PACER). Specifically, PACER establishes a stochastic utility value policy gradient theorem and simultaneously leverages the push-forward operator in the construction of both the actor and the critic. Moreover, based on maximum mean discrepancies (MMD), a novel sample-based encourager is designed to incentivize exploration. Experimental evaluations on various continuous control benchmarks demonstrate the superiority of our algorithm over the state-of-the-art.
arXiv.org Artificial Intelligence
Jun-11-2023