An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy Search

Neural Information Processing Systems 

These have the opposite properties, with DRL having good sample efficiency and poor stability, while ES being vice versa.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found