post-update return distribution
6191ab7080c840f67eaf5dff7d5edfcb-Supplemental-Conference.pdf
Diversity in equally-performing policies.We show that different neighborhoods correspond to different post-update return distributions and agent behaviors. We discover that at equal average returns, different policies obtained by the same deep RL algorithm may in fact have substantially different distributional profiles, as measured by statistics of the post-update return distribution.
Country:
- North America > United States > Louisiana (0.04)
- North America > Canada > Quebec (0.04)
- Asia > Middle East > Jordan (0.04)
Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
Country:
- North America > Canada > Quebec > Montreal (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)
Country:
- North America > Canada > Quebec > Montreal (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)
Country:
- North America > Canada > Quebec > Montreal (0.14)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)