Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward

Guo, Yanjiang, Gao, Jingyue, Wu, Zheng, Shi, Chengming, Chen, Jianyu

arXiv.org Artificial Intelligence 

Reinforcement learning has been applied to various real-world tasks, including robotic manipulation with large state-action spaces and sparse reward signals [1]. In these tasks, standard reinforcement learning tends to perform a lot of useless exploration and easily fall into local optimal solutions. To eliminate this problem, previous works often use expert demonstrations to aid online learning, which adopt some successful trajectories to guide the exploration process [2, 3]. However, standard learning from demonstration algorithms often assume that the target leaning task is exactly same with the task where demonstrations are collected [4, 5, 6]. Under this assumption, experts need to collect the corresponding demonstration for each new task, which can be expensive and inefficient. In this paper, we consider a new learning setting where expert data is collected under a single task, while the agent is required to solve different new tasks. For instance as shown in Figure 1, a robot arm aims to solve peg-in-hole tasks.The demonstration is collected on a certain type of hole while the target tasks have different hole shapes (changes in environmental dynamics) or position shifts (changes in reward function). This can be challenging as agents cannot directly imitate those demonstrations from mismatched tasks due to dynamics and reward function changes. However, compared to learning from scratch, those demonstrations should still be able to provide some useful information to help exploration.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found