Goto

Collaborating Authors

 Vu, Brandon


COAST: Constraints and Streams for Task and Motion Planning

arXiv.org Artificial Intelligence

Abstract-- Task and Motion Planning (TAMP) algorithms solve long-horizon robotics tasks by integrating task planning with motion planning; the task planner proposes a sequence of actions towards a goal state and the motion planner verifies whether this action sequence is geometrically feasible for the robot. We aim to equip a robot with the ability to solve complex We propose a probabilistically-complete, plan-first TAMP long-horizon tasks that require a combination of symbolic algorithm that is significantly faster than PDDLStream and geometric reasoning. This speedup occurs by using a direct stream is an approach for solving such tasks. TAMP methods often planning algorithm to create stream objects after task use task planning to produce a sequence of symbolic planning rather than before to avoid the computational cost actions, i.e. a task plan, in addition to using sampling-based of task planning with many unnecessary stream objects. We motion planning to ensure the task plan is geometrically validate our method on three TAMP domains (Figure 1), each feasible.


Stabilize to Act: Learning to Coordinate for Bimanual Manipulation

arXiv.org Artificial Intelligence

Bimanual coordination is pervasive, spanning household activities such as cutting food, surgical skills such as suturing a wound, or industrial tasks such as connecting two cables. In robotics, the addition of a second arm opens the door to a higher level of task complexity, but comes with a number of control challenges. With a second arm, we have to reason about how to produce coordinated behavior in a higher dimensional action space, resulting in more computationally challenging learning, planning, and optimization problems. The addition of a second arm also complicates data collection--it requires teleoperating a robot with more degrees of freedom--which hinders our ability to rely on methods that require expert bimanual demonstrations. To combat these challenges, we can draw inspiration from how humans tackle bimanual tasks--specifically alternating between using one arm to stabilize parts of the environment, then using the other arm to act conditioned on the stabilized state of the world. Alternating stabilizing and acting offers a significant gain over both model-based and data-driven prior approaches for bimanual manipulation. Previous model-based techniques have proposed planning algorithms for bimanual tasks such as collaborative transport or scooping [1, 2, 3], but require hand-designed specialized primitives or follow predefined trajectories limiting their abilities to learn new skills or adapt. On another extreme, we turn to reinforcement learning (RL) techniques that do not need costly primitives. However, RL methods are notoriously data hungry and a high-dimensional bimanual action space further exacerbates this problem.


Robot Fine-Tuning Made Easy: Pre-Training Rewards and Policies for Autonomous Real-World Reinforcement Learning

arXiv.org Artificial Intelligence

The pre-train and fine-tune paradigm in machine learning has had dramatic success in a wide range of domains because the use of existing data or pre-trained models on the internet enables quick and easy learning of new tasks. We aim to enable this paradigm in robotic reinforcement learning, allowing a robot to learn a new task with little human effort by leveraging data and models from the Internet. However, reinforcement learning often requires significant human effort in the form of manual reward specification or environment resets, even if the policy is pre-trained. We introduce RoboFuME, a reset-free fine-tuning system that pre-trains a multi-task manipulation policy from diverse datasets of prior experiences and self-improves online to learn a target task with minimal human intervention. Our insights are to utilize calibrated offline reinforcement learning techniques to ensure efficient online fine-tuning of a pre-trained policy in the presence of distribution shifts and leverage pre-trained vision language models (VLMs) to build a robust reward classifier for autonomously providing reward signals during the online fine-tuning process. In a diverse set of five real robot manipulation tasks, we show that our method can incorporate data from an existing robot dataset collected at a different institution and improve on a target task within as little as 3 hours of autonomous real-world experience. We also demonstrate in simulation experiments that our method outperforms prior works that use different RL algorithms or different approaches for predicting rewards. Project website: https://robofume.github.io