Reviews: Towards Generalization and Simplicity in Continuous Control
–Neural Information Processing Systems
The paper evaluates natural policy gradient algorithm with simple linear policies on a wide range of "challenging" problems from OpenAI MuJoco environment, and shows that these shallow policy networks can learn effective policies in most domains, sometimes faster than NN policies. It further explores learning robust and more global policies by modifying existing domains, e.g. The first part of the paper, while not proposing new approaches, offers interesting insights into the performance of linear policies, given plethora of prior work on applying NN policies as default on these problems. This part can be further strengthened by doing ablation study on the RL optimizer. Specifically, GAE, sigma vs alpha in Eq. 5, and small trajectory batch vs large trajectory batch (SGD vs batch opt).
Neural Information Processing Systems
Oct-8-2024, 05:11:29 GMT
- Technology: