Towards More Sample Efficiency in Reinforcement Learning with Data Augmentation

Lin, Yijiong, Huang, Jiancong, Zimmer, Matthieu, Rojas, Juan, Weng, Paul

Oct-22-2019–arXiv.org Artificial Intelligence

In this framework, the robot learning problem corresponds to an RL problem that aims at obtaining a policy π: S G A such that the expected discounted sum of rewards is maximized for any given goal. When the reward function is sparse, as assumed here, this RL problem is particularly hard to solve. In particular, we consider here reward functions that are described as follows: R ( s,a,s null,g) 1[ d( s null,g) null R] 1 where 1 is the indicator function, d is a distance, and null R 0 is a fixed threshold. To tackle this issue, Andrychowicz et al. [2017] proposed HER, which is based on the following principle: Any trajectory that failed to reach its goal still carries useful information; it has at least reached the states of its trajectory path. Using this natural and powerful idea, memory replay can be augmented with the failed trajectories by changing their goals in hindsight .

learning, symmetry, trajectory, (11 more...)

arXiv.org Artificial Intelligence

Oct-22-2019

arXiv.org PDF

Add feedback

Country:
- North America > Canada (0.04)
- Asia > China
  - Shanghai > Shanghai (0.04)

Genre:
- Research Report (0.83)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found