Sukhatme, Gaurav S.
QuadSwarm: A Modular Multi-Quadrotor Simulator for Deep Reinforcement Learning with Direct Thrust Control
Huang, Zhehui, Batra, Sumeet, Chen, Tao, Krupani, Rahul, Kumar, Tushar, Molchanov, Artem, Petrenko, Aleksei, Preiss, James A., Yang, Zhaojing, Sukhatme, Gaurav S.
Reinforcement learning (RL) has shown promise in creating robust policies for robotics tasks. However, contemporary RL algorithms are data-hungry, often requiring billions of environment transitions to train successful policies. This necessitates the use of fast and highly-parallelizable simulators. In addition to speed, such simulators need to model the physics of the robots and their interaction with the environment to a level acceptable for transferring policies learned in simulation to reality. We present QuadSwarm, a fast, reliable simulator for research in single and multi-robot RL for quadrotors that addresses both issues. QuadSwarm, with fast forward-dynamics propagation decoupled from rendering, is designed to be highly parallelizable such that throughput scales linearly with additional compute. It provides multiple components tailored toward multi-robot RL, including diverse training scenarios, and provides domain randomization to facilitate the development and sim2real transfer of multi-quadrotor control policies. Initial experiments suggest that QuadSwarm achieves over 48,500 simulation samples per second (SPS) on a single quadrotor and over 62,000 SPS on eight quadrotors on a 16-core CPU. The code can be found in https://github.com/Zhehui-Huang/quad-swarm-rl.
Granular Gym: High Performance Simulation for Robotic Tasks with Granular Materials
Millard, David, Pastor, Daniel, Bowkett, Joseph, Backes, Paul, Sukhatme, Gaurav S.
Granular materials are of critical interest to many robotic tasks in planetary science, construction, and manufacturing. However, the dynamics of granular materials are complex and often computationally very expensive to simulate. We propose a set of methodologies and a system for the fast simulation of granular materials on Graphics Processing Units (GPUs), and show that this simulation is fast enough for basic training with Reinforcement Learning algorithms, which currently require many dynamics samples to achieve acceptable performance. Our method models granular material dynamics using implicit timestepping methods for multibody rigid contacts, as well as algorithmic techniques for efficient parallel collision detection between pairs of particles and between particle and arbitrarily shaped rigid bodies, and programming techniques for minimizing warp divergence on Single-Instruction, Multiple-Thread (SIMT) chip architectures. We showcase our simulation system on several environments targeted toward robotic tasks, and release our simulator as an open-source tool.
IndustReal: Transferring Contact-Rich Assembly Tasks from Simulation to Reality
Tang, Bingjie, Lin, Michael A., Akinola, Iretiayo, Handa, Ankur, Sukhatme, Gaurav S., Ramos, Fabio, Fox, Dieter, Narang, Yashraj
Robotic assembly is a longstanding challenge, requiring contact-rich interaction and high precision and accuracy. Many applications also require adaptivity to diverse parts, poses, and environments, as well as low cycle times. In other areas of robotics, simulation is a powerful tool to develop algorithms, generate datasets, and train agents. However, simulation has had a more limited impact on assembly. We present IndustReal, a set of algorithms, systems, and tools that solve assembly tasks in simulation with reinforcement learning (RL) and successfully achieve policy transfer to the real world. Specifically, we propose 1) simulation-aware policy updates, 2) signed-distance-field rewards, and 3) sampling-based curricula for robotic RL agents. We use these algorithms to enable robots to solve contact-rich pick, place, and insertion tasks in simulation. We then propose 4) a policy-level action integrator to minimize error at policy deployment time. We build and demonstrate a real-world robotic assembly system that uses the trained policies and action integrator to achieve repeatable performance in the real world. Finally, we present hardware and software tools that allow other researchers to reproduce our system and results. For videos and additional details, please see http://sites.google.com/nvidia.com/industreal .
Fast and Scalable Signal Inference for Active Robotic Source Seeking
Denniston, Christopher E., Peltzer, Oriana, Ott, Joshua, Moon, Sangwoo, Kim, Sung-Kyun, Sukhatme, Gaurav S., Kochenderfer, Mykel J., Schwager, Mac, Agha-mohammadi, Ali-akbar
In active source seeking, a robot takes repeated measurements in order to locate a signal source in a cluttered and unknown environment. A key component of an active source seeking robot planner is a model that can produce estimates of the signal at unknown locations with uncertainty quantification. This model allows the robot to plan for future measurements in the environment. Traditionally, this model has been in the form of a Gaussian process, which has difficulty scaling and cannot represent obstacles. %In this work, We propose a global and local factor graph model for active source seeking, which allows the model to scale to a large number of measurements and represent unknown obstacles in the environment. We combine this model with extensions to a highly scalable planner to form a system for large-scale active source seeking. We demonstrate that our approach outperforms baseline methods in both simulated and real robot experiments.
Learned Parameter Selection for Robotic Information Gathering
Denniston, Christopher E., Salhotra, Gautam, Kangaslahti, Akseli, Caron, David A., Sukhatme, Gaurav S.
When robots are deployed in the field for environmental monitoring they typically execute pre-programmed motions, such as lawnmower paths, instead of adaptive methods, such as informative path planning. One reason for this is that adaptive methods are dependent on parameter choices that are both critical to set correctly and difficult for the non-specialist to choose. Here, we show how to automatically configure a planner for informative path planning by training a reinforcement learning agent to select planner parameters at each iteration of informative path planning. We demonstrate our method with 37 instances of 3 distinct environments, and compare it against pure (end-to-end) reinforcement learning techniques, as well as approaches that do not use a learned model to change the planner parameters. Our method shows a 9.53% mean improvement in the cumulative reward across diverse environments when compared to end-to-end learning based methods; we also demonstrate via a field experiment how it can be readily used to facilitate high performance deployment of an information gathering robot.
Probabilistic Trajectory Planning for Static and Interaction-aware Dynamic Obstacle Avoidance
ลenbaลlar, Baskฤฑn, Sukhatme, Gaurav S.
Collision-free mobile robot navigation is an important problem for many robotics applications, especially in cluttered environments. In such environments, obstacles can be static or dynamic. Dynamic obstacles can additionally be interactive, i.e. changing their behavior according to the behavior of other entities. The perception and prediction modules of robotic systems create probabilistic representations and predictions of such environments. In this paper, we propose a novel prediction representation for interactive behaviors of dynamic obstacles. Then, we propose a real-time trajectory planning algorithm that probabilistically avoids collisions against static and interactive dynamic obstacles, and produces dynamically feasible trajectories. During decision making, our planner simulates the interactive behavior of dynamic obstacles in response to the actions planning robot takes. We explicitly minimize collision probabilities against static and dynamic obstacles using a multi-objective search formulation. Then, we formulate a quadratic program to safely fit a smooth trajectory to the search result while attempting to preserve the collision probabilities computed during search. We evaluate our algorithm extensively in simulations to show its performance under different environments and configurations using 78000 randomly generated cases. We compare its performance to a state-of-the-art trajectory planning algorithm for static and dynamic obstacle avoidance using 4500 randomly generated cases. We show that our algorithm achieves up to 3.8x success rate using as low as 0.18x time the baseline uses. We implement our algorithm for physical quadrotors, and show its feasibility in the real world.
RREx-BoT: Remote Referring Expressions with a Bag of Tricks
Sigurdsson, Gunnar A., Thomason, Jesse, Sukhatme, Gaurav S., Piramuthu, Robinson
Household robots operate in the same space for years. Such robots incrementally build dynamic maps that can be used for tasks requiring remote object localization. However, benchmarks in robot learning often test generalization through inference on tasks in unobserved environments. In an observed environment, locating an object is reduced to choosing from among all object proposals in the environment, which may number in the 100,000s. Armed with this intuition, using only a generic vision-language scoring model with minor modifications for 3d encoding and operating in an embodied environment, we demonstrate an absolute performance gain of 9.84% on remote object grounding above state of the art models for REVERIE and of 5.04% on FAO. When allowed to pre-explore an environment, we also exceed the previous state of the art pre-exploration method on REVERIE. Additionally, we demonstrate our model on a real-world TurtleBot platform, highlighting the simplicity and usefulness of the approach. Our analysis outlines a "bag of tricks" essential for accomplishing this task, from utilizing 3d coordinates and context, to generalizing vision-language models to large 3d search spaces.
OpenD: A Benchmark for Language-Driven Door and Drawer Opening
Zhao, Yizhou, Gao, Qiaozi, Qiu, Liang, Thattai, Govind, Sukhatme, Gaurav S.
We introduce OPEND, a benchmark for learning how to use a hand to open cabinet doors or drawers in a photo-realistic and physics-reliable simulation environment driven by language instruction. To solve the task, we propose a multi-step planner composed of a deep neural network and rule-base controllers. The network is utilized to capture spatial relationships from images and understand semantic meaning from language instructions. Controllers efficiently execute the plan based on the spatial and semantic understanding. We evaluate our system by measuring its zero-shot performance in test data set. Experimental results demonstrate the effectiveness of decision planning by our multi-step planner for different hands, while suggesting that there is significant room for developing better models to address the challenge brought by language understanding, spatial reasoning, and long-term manipulation. We will release OPEND and host challenges to promote future research in this area.
Supervised Learning and Reinforcement Learning of Feedback Models for Reactive Behaviors: Tactile Feedback Testbed
Sutanto, Giovanni, Rombach, Katharina, Chebotar, Yevgen, Su, Zhe, Schaal, Stefan, Sukhatme, Gaurav S., Meier, Franziska
Robots need to be able to adapt to unexpected changes in the environment such that they can autonomously succeed in their tasks. However, hand-designing feedback models for adaptation is tedious, if at all possible, making data-driven methods a promising alternative. In this paper we introduce a full framework for learning feedback models for reactive motion planning. Our pipeline starts by segmenting demonstrations of a complete task into motion primitives via a semi-automated segmentation algorithm. Then, given additional demonstrations of successful adaptation behaviors, we learn initial feedback models through learning from demonstrations. In the final phase, a sample-efficient reinforcement learning algorithm fine-tunes these feedback models for novel task settings through few real system interactions. We evaluate our approach on a real anthropomorphic robot in learning a tactile feedback task.
CLIP-Nav: Using CLIP for Zero-Shot Vision-and-Language Navigation
Dorbala, Vishnu Sashank, Sigurdsson, Gunnar, Piramuthu, Robinson, Thomason, Jesse, Sukhatme, Gaurav S.
Household environments are visually diverse. Embodied agents performing Vision-and-Language Navigation (VLN) in the wild must be able to handle this diversity, while also following arbitrary language instructions. Recently, Vision-Language models like CLIP have shown great performance on the task of zero-shot object recognition. In this work, we ask if these models are also capable of zero-shot language grounding. In particular, we utilize CLIP to tackle the novel problem of zero-shot VLN using natural language referring expressions that describe target objects, in contrast to past work that used simple language templates describing object classes. We examine CLIP's capability in making sequential navigational decisions without any dataset-specific finetuning, and study how it influences the path that an agent takes. Our results on the coarse-grained instruction following task of REVERIE demonstrate the navigational capability of CLIP, surpassing the supervised baseline in terms of both success rate (SR) and success weighted by path length (SPL). More importantly, we quantitatively show that our CLIP-based zero-shot approach generalizes better to show consistent performance across environments when compared to SOTA, fully supervised learning approaches when evaluated via Relative Change in Success (RCS).