DNA: Proximal Policy Optimization with a Dual Network Architecture

Neural Information Processing Systems 

Instead, we show that learning these tasks independently, but with a constrained distillation phase, significantly improves performance. Furthermore, we find that policy gradient noise levels decrease when using a lower variance return estimate.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found