Learning from Trajectories via Subgoal Discovery
Paul, Sujoy, Vanbaar, Jeroen, Roy-Chowdhury, Amit
–Neural Information Processing Systems
Learning to solve complex goal-oriented tasks with sparse terminal-only rewards often requires an enormous number of samples. In such cases, using a set of expert trajectories could help to learn faster. However, Imitation Learning (IL) via supervised pre-training with these trajectories may not perform as well and generally requires additional finetuning with expert-in-the-loop. In this paper, we propose an approach which uses the expert trajectories and learns to decompose the complex main task into smaller sub-goals. We learn a function which partitions the state-space into sub-goals, which can then be used to design an extrinsic reward function.
Neural Information Processing Systems
Mar-19-2020, 00:02:25 GMT
- Technology: