Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards

Neural Information Processing Systems 

Recent work demonstrated that using a memory buffer of previous successful trajectories can result in more effective policies. However, existing methods may overly exploit past successful experiences, which can encourage the agent to adopt sub-optimal and myopic behaviors.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found