Goto

Collaborating Authors

 Uppal, Shagun


SPIN: Simultaneous Perception, Interaction and Navigation

arXiv.org Artificial Intelligence

While there has been remarkable progress recently in the fields of manipulation and locomotion, mobile manipulation remains a long-standing challenge. Compared to locomotion or static manipulation, a mobile system must make a diverse range of long-horizon tasks feasible in unstructured and dynamic environments. While the applications are broad and interesting, there are a plethora of challenges in developing these systems such as coordination between the base and arm, reliance on onboard perception for perceiving and interacting with the environment, and most importantly, simultaneously integrating all these parts together. Prior works approach the problem using disentangled modular skills for mobility and manipulation that are trivially tied together. This causes several limitations such as compounding errors, delays in decision-making, and no whole-body coordination. In this work, we present a reactive mobile manipulation framework that uses an active visual system to consciously perceive and react to its environment. Similar to how humans leverage whole-body and hand-eye coordination, we develop a mobile manipulator that exploits its ability to move and see, more specifically -- to move in order to see and to see in order to move. This allows it to not only move around and interact with its environment but also, choose "when" to perceive "what" using an active visual system. We observe that such an agent learns to navigate around complex cluttered scenarios while displaying agile whole-body coordination using only ego-vision without needing to create environment maps. Results visualizations and videos at https://spin-robot.github.io/


Dexterous Functional Grasping

arXiv.org Artificial Intelligence

While there have been significant strides in dexterous manipulation, most of it is limited to benchmark tasks like in-hand reorientation which are of limited utility in the real world. The main benefit of dexterous hands over two-fingered ones is their ability to pickup tools and other objects (including thin ones) and grasp them firmly to apply force. However, this task requires both a complex understanding of functional affordances as well as precise low-level control. While prior work obtains affordances from human data this approach doesn't scale to low-level control. Similarly, simulation training cannot give the robot an understanding of real-world semantics. In this paper, we aim to combine the best of both worlds to accomplish functional grasping for in-the-wild objects. We use a modular approach. First, affordances are obtained by matching corresponding regions of different objects and then a low-level policy trained in sim is run to grasp it. We propose a novel application of eigengrasps to reduce the search space of RL using a small amount of human data and find that it leads to more stable and physically realistic motion. We find that eigengrasp action space beats baselines in simulation and outperforms hardcoded grasping in real and matches or outperforms a trained human teleoperator. Results visualizations and videos at https://dexfunc.github.io/


Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation

arXiv.org Artificial Intelligence

Solving complex manipulation tasks in obstructed environments is a challenging problem in deep reinforcement learning (RL) since it requires precise object interactions as well as collision-free movement across obstacles. To tackle this problem, prior works [1-3] have proposed to combine the strengths of motion planning (MP) and RL - safe collision-free maneuvers of MP and sophisticated contact-rich interactions of RL, demonstrating promising results. However, MP requires access to the geometric state of an environment for collision checking, which is often not available in the real world, and is also computationally expensive for a real-time control. To deploy such agents in realistic settings, we need to resolve the dependency on the state information and costly computation of MP, such that the agent can perform a task in the visual domain. To this end, we propose a two-step distillation framework, motion planner augmented policy distillation (MoPA-PD), that transfers the state-based motion planner augmented RL policy (MoPA-RL [1]) into a visual control policy, thereby removing the motion planner and the dependency on the state information. Concretely, our framework consists of two stages: (1) visual behavioral cloning (BC [4]) with trajectories collected using the MoPA-RL policy and (2) vision-based RL training with the guidance of smoothed trajectories from the BC policy. The first step, visual BC, removes the dependency on the motion planner and the resulting visual BC policy generates smoother behaviors compared to the motion planner's jittery behaviors.