Trust Region-Guided Proximal Policy Optimization

Yuhui Wang, Hao He, Xiaoyang Tan, Yaozhong Gan

Neural Information Processing Systems 

However, the first-order optimizer is not very accurate for curved areas.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found