Allen, Peter
Teleoperated Robot Grasping in Virtual Reality Spaces
Hu, Jiaheng, Watkins, David, Allen, Peter
Abstract-- Despite recent advancement in virtual reality technology, teleoperating a high DoF robot to complete dexterous tasks in cluttered scenes remains difficult. In this work, we propose a system that allows the user to teleoperate a Fetch robot to perform grasping in an easy and intuitive way, through exploiting the rich environment information provided by the virtual reality space. Our system has the benefit of easy transferability to different robots and different tasks, and can be used without any expert knowledge. We tested the system on a real fetch robot, and a video demonstrating the effectiveness of our system can be seen at https://youtu.be/1-xW2Bx_Cms I. INTRODUCTION Robot teleoperation is a useful tool for handling tasks that are hard to be completed autonomously by robots, such as the DARPA Robotics Challenge [1]. The high DoF of the robot makes it hard to create 1-1 mapping from the controller inputs to robot actions.
Multiple View Performers for Shape Completion
Watkins, David, Allen, Peter, Choromanski, Krzysztof, Varley, Jacob, Waytowich, Nicholas
We propose the Multiple View Performer (MVP) - a new architecture for 3D shape completion from a series of temporally sequential views. MVP accomplishes this task by using linear-attention Transformers called Performers. Our model allows the current observation of the scene to attend to the previous ones for more accurate infilling. The history of past observations is compressed via the compact associative memory approximating modern continuous Hopfield memory, but crucially of size independent from the history length. We compare our model with several baselines for shape completion over time, demonstrating the generalization gains that MVP provides. To the best of our knowledge, MVP is the first multiple view voxel reconstruction method that does not require registration of multiple depth views and the first causal Transformer based model for 3D shape completion.
Learning Your Way Without Map or Compass: Panoramic Target Driven Visual Navigation
Watkins-Valls, David, Xu, Jingxi, Waytowich, Nicholas, Allen, Peter
Learning Y our Way Without Map or Compass: Panoramic T arget Driven Visual Navigation David Watkins-V alls,1, Jingxi Xu,1, Nicholas Waytowich 2 and Peter Allen 1 Abstract -- We present a robot navigation system that uses an imitation learning framework to successfully navigate in complex environments. Our framework takes a pre-built 3D scan of a real environment and trains an agent from pre-generated expert trajectories to navigate to any position given a panoramic view of the goal and the current visual input without relying on map, compass, odometry, GPS or relative position of the target at runtime. Our end-to-end trained agent uses RGB and depth (RGBD) information and can handle large environments (up to 1031 m 2) across multiple rooms (up to 40) and generalizes to unseen targets. We show that when compared to several baselines using deep reinforcement learning and RGBD SLAM, our method (1) requires fewer training examples and less training time, (2) reaches the goal location with higher accuracy, (3) produces better solutions with shorter paths for long-range navigation tasks, and (4) generalizes to unseen environments given an RGBD map of the environment. I NTRODUCTION The ability to navigate efficiently and accurately within an environment is fundamental to intelligent behavior and has been a focus of research in robotics for many years. Traditionally, robotic navigation is solved using model-based methods with an explicit focus on position inference and mapping, such as Simultaneous Localization and Mapping (SLAM) [1]. These models use path planning algorithms, such as Probabilistic Roadmaps (PRM) [2] and Rapidly Exploring Random Trees (RRT) [3], [4] to plan a collision-free path. These methods ignore the rich information from visual input and are highly sensitive to robot odometry and noise in sensor data. For example, a robot navigating through a room may lose track of its position due to the navigation software not properly modeling friction.
MAT: Multi-Fingered Adaptive Tactile Grasping via Deep Reinforcement Learning
Wu, Bohan, Akinola, Iretiayo, Varley, Jacob, Allen, Peter
Vision-based grasping systems typically adopt an open-loop execution of a planned grasp. This policy can fail due to many reasons, including ubiquitous calibration error. Recovery from a failed grasp is further complicated by visual occlusion, as the hand is usually occluding the vision sensor as it attempts another open-loop regrasp. This work presents MAT, a tactile closed-loop method capable of realizing grasps provided by a coarse initial positioning of the hand above an object. Our algorithm is a deep reinforcement learning (RL) policy optimized through the clipped surrogate objective within a maximum entropy RL framework to balance exploitation and exploration. The method utilizes tactile and proprioceptive information to act through both fine finger motions and larger regrasp movements to execute stable grasps. A novel curriculum of action motion magnitude makes learning more tractable and helps turn common failure cases into successes. Careful selection of features that exhibit small sim-to-real gaps enables this tactile grasping policy, trained purely in simulation, to transfer well to real world environments without the need for additional learning. Experimentally, this methodology improves over a vision-only grasp success rate substantially on a multi-fingered robot hand. When this methodology is used to realize grasps from coarse initial positions provided by a vision-only planner, the system is made dramatically more robust to calibration errors in the camera-robot transform.
Multi-Modal Geometric Learning for Grasping and Manipulation
Watkins, David, Varley, Jacob, Allen, Peter
This work provides an architecture that incorporates depth and tactile information to create rich and accurate 3D models useful for robotic manipulation tasks. This is accomplished through the use of a 3D convolutional neural network (CNN). Offline, the network is provided with both depth and tactile information and trained to predict the object's geometry, thus filling in regions of occlusion. At runtime, the network is provided a partial view of an object. Tactile information is acquired to augment the captured depth information. The network can then reason about the object's geometry by utilizing both the collected tactile and depth information. We demonstrate that even small amounts of additional tactile information can be incredibly helpful in reasoning about object geometry. This is particularly true when information from depth alone fails to produce an accurate geometric prediction. Our method is benchmarked against and outperforms other visual-tactile approaches to general geometric reasoning. We also provide experimental results comparing grasping success with our method.
Human Robot Interface for Assistive Grasping
Watkins, David, Chou, Chaiwen, Weinberg, Caroline, Varley, Jacob, Lyons, Kenneth, Joshi, Sanjay, Weber, Lynne, Stein, Joel, Allen, Peter
This work describes a new human-in-the-loop (HitL) assistive grasping system for individuals with varying levels of physical capabilities. We investigated the feasibility of using four potential input devices with our assistive grasping system interface, using able-bodied individuals to define a set of quantitative metrics that could be used to assess an assistive grasping system. We then took these measurements and created a generalized benchmark for evaluating the effectiveness of any arbitrary input device into a HitL grasping system. The four input devices were a mouse, a speech recognition device, an assistive switch, and a novel sEMG device developed by our group that was connected either to the forearm or behind the ear of the subject. These preliminary results provide insight into how different interface devices perform for generalized assistive grasping tasks and also highlight the potential of sEMG based control for severely disabled individuals.
Articulated Pose Estimation Using Hierarchical Exemplar-Based Models
Liu, Jiongxin (Columbia University) | Li, Yinxiao (Columbia University) | Allen, Peter (Columbia University) | Belhumeur, Peter (Columbia University)
Exemplar-based models have achieved great success on localizing the parts of semi-rigid objects. However, their efficacy on highly articulated objects such as humans is yet to be explored. Inspired by hierarchical object representation and recent application of Deep Convolutional Neural Networks (DCNNs) on human pose estimation, we propose a novel formulation that incorporates both hierarchical exemplar-based models and DCNNs in the spatial terms. Specifically, we obtain more expressive spatial models by assuming independence between exemplars at different levels in the hierarchy; we also obtain stronger spatial constraints by inferring the spatial relations between parts at the same level. As our method strikes a good balance between expressiveness and strength of spatial models, it is both effective and generalizable, achieving state-of-the-art results on different benchmarks: Leeds Sports Dataset and CUB-200-2011.