Divergence-Augmented Policy Optimization

Qing Wang, Yingru Li, Jiechao Xiong, Tong Zhang

Neural Information Processing Systems 

This paper introduces a method to stabilize policy optimization when off-policy data are reused.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found