Goto

Collaborating Authors

 Ha, Sehoon


ASC: Adaptive Skill Coordination for Robotic Mobile Manipulation

arXiv.org Artificial Intelligence

We present Adaptive Skill Coordination (ASC) -- an approach for accomplishing long-horizon tasks like mobile pick-and-place (i.e., navigating to an object, picking it, navigating to another location, and placing it). ASC consists of three components -- (1) a library of basic visuomotor skills (navigation, pick, place), (2) a skill coordination policy that chooses which skill to use when, and (3) a corrective policy that adapts pre-trained skills in out-of-distribution states. All components of ASC rely only on onboard visual and proprioceptive sensing, without requiring detailed maps with obstacle layouts or precise object locations, easing real-world deployment. We train ASC in simulated indoor environments, and deploy it zero-shot (without any real-world experience or fine-tuning) on the Boston Dynamics Spot robot in eight novel real-world environments (one apartment, one lab, two microkitchens, two lounges, one office space, one outdoor courtyard). In rigorous quantitative comparisons in two environments, ASC achieves near-perfect performance (59/60 episodes, or 98%), while sequentially executing skills succeeds in only 44/60 (73%) episodes. Extensive perturbation experiments show that ASC is robust to hand-off errors, changes in the environment layout, dynamic obstacles (e.g., people), and unexpected disturbances. Supplementary videos at adaptiveskillcoordination.github.io.


Imitating and Finetuning Model Predictive Control for Robust and Symmetric Quadrupedal Locomotion

arXiv.org Artificial Intelligence

Control of legged robots is a challenging problem that has been investigated by different approaches, such as model-based control and learning algorithms. This work proposes a novel Imitating and Finetuning Model Predictive Control (IFM) framework to take the strengths of both approaches. Our framework first develops a conventional model predictive controller (MPC) using Differential Dynamic Programming and Raibert heuristic, which serves as an expert policy. Then we train a clone of the MPC using imitation learning to make the controller learnable. Finally, we leverage deep reinforcement learning with limited exploration for further finetuning the policy on more challenging terrains. By conducting comprehensive simulation and hardware experiments, we demonstrate that the proposed IFM framework can significantly improve the performance of the given MPC controller on rough, slippery, and conveyor terrains that require careful coordination of footsteps. We also showcase that IFM can efficiently produce more symmetric, periodic, and energy-efficient gaits compared to Vanilla RL with a minimal burden of reward shaping.


BayRnTune: Adaptive Bayesian Domain Randomization via Strategic Fine-tuning

arXiv.org Artificial Intelligence

Domain randomization (DR), which entails training a policy with randomized dynamics, has proven to be a simple yet effective algorithm for reducing the gap between simulation and the real world. However, DR often requires careful tuning of randomization parameters. Methods like Bayesian Domain Randomization (Bayesian DR) and Active Domain Randomization (Adaptive DR) address this issue by automating parameter range selection using real-world experience. While effective, these algorithms often require long computation time, as a new policy is trained from scratch every iteration. In this work, we propose Adaptive Bayesian Domain Randomization via Strategic Fine-tuning (BayRnTune), which inherits the spirit of BayRn but aims to significantly accelerate the learning processes by fine-tuning from previously learned policy. This idea leads to a critical question: which previous policy should we use as a prior during fine-tuning? We investigated four different fine-tuning strategies and compared them against baseline algorithms in five simulated environments, ranging from simple benchmark tasks to more complex legged robot environments. Our analysis demonstrates that our method yields better rewards in the same amount of timesteps compared to vanilla domain randomization or Bayesian DR.


ViNL: Visual Navigation and Locomotion Over Obstacles

arXiv.org Artificial Intelligence

We present Visual Navigation and Locomotion over obstacles (ViNL), which enables a quadrupedal robot to navigate unseen apartments while stepping over small obstacles that lie in its path (e.g., shoes, toys, cables), similar to how humans and pets lift their feet over objects as they walk. ViNL consists of: (1) a visual navigation policy that outputs linear and angular velocity commands that guides the robot to a goal coordinate in unfamiliar indoor environments; and (2) a visual locomotion policy that controls the robot's joints to avoid stepping on obstacles while following provided velocity commands. Both the policies are entirely "model-free", i.e. sensors-to-actions neural networks trained end-to-end. The two are trained independently in two entirely different simulators and then seamlessly co-deployed by feeding the velocity commands from the navigator to the locomotor, entirely "zero-shot" (without any co-training). While prior works have developed learning methods for visual navigation or visual locomotion, to the best of our knowledge, this is the first fully learned approach that leverages vision to accomplish both (1) intelligent navigation in new environments, and (2) intelligent visual locomotion that aims to traverse cluttered environments without disrupting obstacles. On the task of navigation to distant goals in unknown environments, ViNL using just egocentric vision significantly outperforms prior work on robust locomotion using privileged terrain maps (+32.8% success and -4.42 collisions per meter). Additionally, we ablate our locomotion policy to show that each aspect of our approach helps reduce obstacle collisions. Videos and code at http://www.joannetruong.com/projects/vinl.html


Words into Action: Learning Diverse Humanoid Robot Behaviors using Language Guided Iterative Motion Refinement

arXiv.org Artificial Intelligence

Humanoid robots are well suited for human habitats due to their morphological similarity, but developing controllers for them is a challenging task that involves multiple sub-problems, such as control, planning and perception. In this paper, we introduce a method to simplify controller design by enabling users to train and fine-tune robot control policies using natural language commands. We first learn a neural network policy that generates behaviors given a natural language command, such as "walk forward", by combining Large Language Models (LLMs), motion retargeting, and motion imitation. Based on the synthesized motion, we iteratively fine-tune by updating the text prompt and querying LLMs to find the best checkpoint associated with the closest motion in history. We validate our approach using a simulated Digit humanoid robot and demonstrate learning of diverse motions, such as walking, hopping, and kicking, without the burden of complex reward engineering. In addition, we show that our iterative refinement enables us to learn 3x times faster than a naive formulation that learns from scratch.


Learning manipulation of steep granular slopes for fast Mini Rover turning

arXiv.org Artificial Intelligence

Future planetary exploration missions will require reaching challenging regions such as craters and steep slopes. Such regions are ubiquitous and present science-rich targets potentially containing information regarding the planet's internal structure. Steep slopes consisting of low-cohesion regolith are prone to flow downward under small disturbances, making it very challenging for autonomous rovers to traverse. Moreover, the navigation trajectories of rovers are heavily limited by the terrain topology and future systems will need to maneuver on flowable surfaces without getting trapped, allowing them to further expand their reach and increase mission efficiency. In this work, we used a laboratory-scale rover robot and performed maneuvering experiments on a steep granular slope of poppy seeds to explore the rover's turning capabilities. The rover is capable of lifting, sweeping, and spinning its wheels, allowing it to execute leg-like gait patterns. The high-dimensional actuation capabilities of the rover facilitate effective manipulation of the underlying granular surface. We used Bayesian Optimization (BO) to gain insight into successful turning gaits in high dimensional search space and found strategies such as differential wheel spinning and pivoting around a single sweeping wheel. We then used these insights to further fine-tune the turning gait, enabling the rover to turn 90 degrees at just above 4 seconds with minimal slip. Combining gait optimization and human-tuning approaches, we found that fast turning is empowered by creating anisotropic torques with the sweeping wheel.


CrossLoco: Human Motion Driven Control of Legged Robots via Guided Unsupervised Reinforcement Learning

arXiv.org Artificial Intelligence

Human motion driven control (HMDC) is an effective approach for generating natural and compelling robot motions while preserving high-level semantics. However, establishing the correspondence between humans and robots with different body structures is not straightforward due to the mismatches in kinematics and dynamics properties, which causes intrinsic ambiguity to the problem. Many previous algorithms approach this motion retargeting problem with unsupervised learning, which requires the prerequisite skill sets. However, it will be extremely costly to learn all the skills without understanding the given human motions, particularly for high-dimensional robots. In this work, we introduce CrossLoco, a guided unsupervised reinforcement learning framework that simultaneously learns robot skills and their correspondence to human motions. Our key innovation is to introduce a cycle-consistency-based reward term designed to maximize the mutual information between human motions and robot states. We demonstrate that the proposed framework can generate compelling robot motions by translating diverse human motions, such as running, hopping, and dancing. We quantitatively compare our CrossLoco against the manually engineered and unsupervised baseline algorithms along with the ablated versions of our framework and demonstrate that our method translates human motions with better accuracy, diversity, and user preference. We also showcase its utility in other applications, such as synthesizing robot movements from language input and enabling interactive robot control.


Principles and Guidelines for Evaluating Social Robot Navigation Algorithms

arXiv.org Artificial Intelligence

A major challenge to deploying robots widely is navigation in human-populated environments, commonly referred to as social robot navigation. While the field of social navigation has advanced tremendously in recent years, the fair evaluation of algorithms that tackle social navigation remains hard because it involves not just robotic agents moving in static environments but also dynamic human agents and their perceptions of the appropriateness of robot behavior. In contrast, clear, repeatable, and accessible benchmarks have accelerated progress in fields like computer vision, natural language processing and traditional robot navigation by enabling researchers to fairly compare algorithms, revealing limitations of existing solutions and illuminating promising new directions. We believe the same approach can benefit social navigation. In this paper, we pave the road towards common, widely accessible, and repeatable benchmarking criteria to evaluate social robot navigation. Our contributions include (a) a definition of a socially navigating robot as one that respects the principles of safety, comfort, legibility, politeness, social competency, agent understanding, proactivity, and responsiveness to context, (b) guidelines for the use of metrics, development of scenarios, benchmarks, datasets, and simulators to evaluate social navigation, and (c) a design of a social navigation metrics framework to make it easier to compare results from different simulators, robots and datasets.


Transforming a Quadruped into a Guide Robot for the Visually Impaired: Formalizing Wayfinding, Interaction Modeling, and Safety Mechanism

arXiv.org Artificial Intelligence

This paper explores the principles for transforming a quadrupedal robot into a guide robot for individuals with visual impairments. A guide robot has great potential to resolve the limited availability of guide animals that are accessible to only two to three percent of the potential blind or visually impaired (BVI) users. To build a successful guide robot, our paper explores three key topics: (1) formalizing the navigation mechanism of a guide dog and a human, (2) developing a data-driven model of their interaction, and (3) improving user safety. First, we formalize the wayfinding task of the human-guide robot team using Markov Decision Processes based on the literature and interviews. Then we collect real human-robot interaction data from three visually impaired and six sighted people and develop an interaction model called the ``Delayed Harness'' to effectively simulate the navigation behaviors of the team. Additionally, we introduce an action shielding mechanism to enhance user safety by predicting and filtering out dangerous actions. We evaluate the developed interaction model and the safety mechanism in simulation, which greatly reduce the prediction errors and the number of collisions, respectively. We also demonstrate the integrated system on a quadrupedal robot with a rigid harness, by guiding users over $100+$~m trajectories.


ACE: Adversarial Correspondence Embedding for Cross Morphology Motion Retargeting from Human to Nonhuman Characters

arXiv.org Artificial Intelligence

Motion retargeting is a promising approach for generating natural and compelling animations for nonhuman characters. However, it is challenging to translate human movements into semantically equivalent motions for target characters with different morphologies due to the ambiguous nature of the problem. This work presents a novel learning-based motion retargeting framework, Adversarial Correspondence Embedding (ACE), to retarget human motions onto target characters with different body dimensions and structures. Our framework is designed to produce natural and feasible robot motions by leveraging generative-adversarial networks (GANs) while preserving high-level motion semantics by introducing an additional feature loss. In addition, we pretrain a robot motion prior that can be controlled in a latent embedding space and seek to establish a compact correspondence. We demonstrate that the proposed framework can produce retargeted motions for three different characters -- a quadrupedal robot with a manipulator, a crab character, and a wheeled manipulator. We further validate the design choices of our framework by conducting baseline comparisons and a user study. We also showcase sim-to-real transfer of the retargeted motions by transferring them to a real Spot robot.