Harnessing Distribution Ratio Estimators for Learning Agents with Quality and Diversity

Gangwani, Tanmay, Peng, Jian, Zhou, Yuan

arXiv.org Machine Learning 

The goal in Reinforcement Learning (RL) is to learn agents that maximize long-term environmental rewards. Deep RL, which uses deep neural networks as function approximators for the policy and value-functions, has achieved outstanding results on a wide variety of sequential decision making problems, with the barometer of success usually being the total returns accumulated by the final policy. Due to the intrinsic nature of direct reward maximization, seldom is the focus on how the behavioral characteristics of the trained agent compare with the other possible behaviors in the solution space. For instance, consider the robotic manipulator arm in Figure 1a and the peg-insertion task. Though the task description is simple, for a sufficiently flexible arm, there are numerous ways (positions of the joints and the end-effector) to insert the peg in the hole (Figure 1b).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found