Goto

Collaborating Authors

 Reinforcement Learning



Object-CategoryAwareReinforcementLearning

Neural Information Processing Systems

Reinforcement Learning (RL) has achievedimpressiveprogress inrecent years, such asresults in Atari [24] and Go [28] in which RL agents even perform better than human beings.




A Novel Framework for Policy Mirror Descent with General Parameterization and Linear Convergence Carlo Alfano Department of Statistics University of Oxford

Neural Information Processing Systems

In this work, we introduce a framework for policy optimization based on mirror descent that naturally accommodates general parameterizations. The policy class induced by our scheme recovers known classes, e.g., softmax, and generates new ones depending on the choice of mirror map.






6191ab7080c840f67eaf5dff7d5edfcb-Supplemental-Conference.pdf

Neural Information Processing Systems

Diversity in equally-performing policies.We show that different neighborhoods correspond to different post-update return distributions and agent behaviors. We discover that at equal average returns, different policies obtained by the same deep RL algorithm may in fact have substantially different distributional profiles, as measured by statistics of the post-update return distribution.