Reinforcement Learning
Distributional Policy Evaluation: a Maximum Entropy approach to Representation Learning
In Distributional Reinforcement Learning (D-RL) [Bellemare et al., 2023], an agent aims to estimate Sutton and Barto, 2018], where the objective is to predict the expected return only. In Section 3, we answer this methodological question, showing that it is possible to reformulate Policy Evaluation in a distributional setting so that its performance index is explicitly intertwined with the representation of the (state or action) spaces.