Variational Policy Gradient Method for Reinforcement Learning with General Utilities

Neural Information Processing Systems 

In recent years, reinforcement learning (RL) systems with general goals beyond a cumulative sum of rewards have gained traction, such as in constrained problems, exploration, and acting upon prior experiences. In this paper, we consider policy optimization in Markov Decision Problems, where the objective is a general concave utility function of the state-action occupancy measure, which subsumes several of the aforementioned examples as special cases.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found