Towards More Sample Efficiency in Reinforcement Learning with Data Augmentation

Lin, Yijiong, Huang, Jiancong, Zimmer, Matthieu, Rojas, Juan, Weng, Paul

arXiv.org Artificial Intelligence 

In this framework, the robot learning problem corresponds to an RL problem that aims at obtaining a policy π: S G A such that the expected discounted sum of rewards is maximized for any given goal. When the reward function is sparse, as assumed here, this RL problem is particularly hard to solve. In particular, we consider here reward functions that are described as follows: R ( s,a,s null,g) 1[ d( s null,g) null R] 1 where 1 is the indicator function, d is a distance, and null R 0 is a fixed threshold. To tackle this issue, Andrychowicz et al. [2017] proposed HER, which is based on the following principle: Any trajectory that failed to reach its goal still carries useful information; it has at least reached the states of its trajectory path. Using this natural and powerful idea, memory replay can be augmented with the failed trajectories by changing their goals in hindsight .

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found