Reviews: Evolved Policy Gradients
–Neural Information Processing Systems
The authors present an approach for learning loss functions for reinforcement learning via a combination of evolutionary strategies as an outer loop and a simple policy gradient algorithm in the inner loop. Overall I found this to be a very interesting paper. My one criticism is that I would have liked to see a bit more of a study of what parts of the algorithm and the loss architecture are important. The algorithm itself is relatively simple. Although I appreciate the detail of Algorithm 1, to some degree I feel that this obscures the algorithm. In essense this approach corresponds to "use policy gradient in the inner-loop, and ES in the outer loop".More interesting is the structure of the loss architecture.
Neural Information Processing Systems
Oct-7-2024, 14:26:44 GMT