Towards More Sample Efficiency in Reinforcement Learning with Data Augmentation
Lin, Yijiong, Huang, Jiancong, Zimmer, Matthieu, Rojas, Juan, Weng, Paul
–arXiv.org Artificial Intelligence
In this framework, the robot learning problem corresponds to an RL problem that aims at obtaining a policy π: S G A such that the expected discounted sum of rewards is maximized for any given goal. When the reward function is sparse, as assumed here, this RL problem is particularly hard to solve. In particular, we consider here reward functions that are described as follows: R ( s,a,s null,g) 1[ d( s null,g) null R] 1 where 1 is the indicator function, d is a distance, and null R 0 is a fixed threshold. To tackle this issue, Andrychowicz et al. [2017] proposed HER, which is based on the following principle: Any trajectory that failed to reach its goal still carries useful information; it has at least reached the states of its trajectory path. Using this natural and powerful idea, memory replay can be augmented with the failed trajectories by changing their goals in hindsight .
arXiv.org Artificial Intelligence
Oct-22-2019