Distributional Policy Evaluation: a Maximum Entropy approach to Representation Learning
–Neural Information Processing Systems
In Distributional Reinforcement Learning (D-RL) [Bellemare et al., 2023], an agent aims to estimate Sutton and Barto, 2018], where the objective is to predict the expected return only. In Section 3, we answer this methodological question, showing that it is possible to reformulate Policy Evaluation in a distributional setting so that its performance index is explicitly intertwined with the representation of the (state or action) spaces.
Neural Information Processing Systems
Nov-14-2025, 13:41:04 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- Europe
- Italy > Lombardy
- Milan (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Italy > Lombardy
- North America > United States
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Texas > Travis County
- Austin (0.04)
- Massachusetts > Middlesex County
- Asia > Middle East
- Technology: