Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning

Huang, De-An, Xu, Danfei, Zhu, Yuke, Garg, Animesh, Savarese, Silvio, Fei-Fei, Li, Niebles, Juan Carlos

arXiv.org Artificial Intelligence 

Continuous Relaxation of Symbolic Planner for One-Shot Imitation Learning De-An Huang 1, Danfei Xu 1, Y uke Zhu 1, Animesh Garg 1, 2, Silvio Savarese 1, Li Fei-Fei 1, Juan Carlos Niebles 1 Abstract -- We address one-shot imitation learning, where the goal is to execute a previously unseen task based on a single demonstration. While there has been exciting progress in this direction, most of the approaches still require a few hundred tasks for meta-training, which limits the scalability of the approaches. Our main contribution is to formulate one-shot imitation learning as a symbolic planning problem along with the symbol grounding problem. This formulation disentangles the policy execution from the inter-task generalization and leads to better data efficiency. The key technical challenge is that the symbol grounding is prone to error with limited training data and leads to subsequent symbolic planning failures. We address this challenge by proposing a continuous relaxation of the discrete symbolic planner that directly plans on the probabilistic outputs of the symbol grounding model. Our continuous relaxation of the planner can still leverage the information contained in the probabilistic symbol grounding and significantly improve over the baseline planner for the one-shot imitation learning tasks without using large training data. I NTRODUCTION We are interested in robots that can learn a wide variety of tasks efficiently. Recently, there has been an increasing interest in the one-shot imitation learning problem [1-7], where the goal is to learn to execute a previously unseen task from only a single demonstration of the task. This setting is also referred to as meta-learning [3, 8], where the meta-training stage uses a set of tasks in a given domain to simulate the one-shot testing scenario. This allows the learned model to generalize to previously unseen tasks with a single demonstration in the meta-testing stage. The main shortcoming of these one-shot approaches is that they typically require a large amount of data for meta-training (400 meta-training tasks in [4] and 1000 in [6] for the Block Stacking task [6]) to generalize reliably to unseen tasks.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found