Evolution-Guided Policy Gradient in Reinforcement Learning

Shauharda Khadka, Kagan Tumer

Neural Information Processing Systems 

Temporal Difference methods inRL use bootstrapping to address this issue but often struggle when the time horizons are long and the reward is sparse.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found