Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation
Liu, I-Chun Arthur, Uppal, Shagun, Sukhatme, Gaurav S., Lim, Joseph J., Englert, Peter, Lee, Youngwoon
–arXiv.org Artificial Intelligence
Solving complex manipulation tasks in obstructed environments is a challenging problem in deep reinforcement learning (RL) since it requires precise object interactions as well as collision-free movement across obstacles. To tackle this problem, prior works [1-3] have proposed to combine the strengths of motion planning (MP) and RL - safe collision-free maneuvers of MP and sophisticated contact-rich interactions of RL, demonstrating promising results. However, MP requires access to the geometric state of an environment for collision checking, which is often not available in the real world, and is also computationally expensive for a real-time control. To deploy such agents in realistic settings, we need to resolve the dependency on the state information and costly computation of MP, such that the agent can perform a task in the visual domain. To this end, we propose a two-step distillation framework, motion planner augmented policy distillation (MoPA-PD), that transfers the state-based motion planner augmented RL policy (MoPA-RL [1]) into a visual control policy, thereby removing the motion planner and the dependency on the state information. Concretely, our framework consists of two stages: (1) visual behavioral cloning (BC [4]) with trajectories collected using the MoPA-RL policy and (2) vision-based RL training with the guidance of smoothed trajectories from the BC policy. The first step, visual BC, removes the dependency on the motion planner and the resulting visual BC policy generates smoother behaviors compared to the motion planner's jittery behaviors.
arXiv.org Artificial Intelligence
Nov-11-2021