Ben, Qingwei
VB-Com: Learning Vision-Blind Composite Humanoid Locomotion Against Deficient Perception
Ren, Junli, Huang, Tao, Wang, Huayi, Wang, Zirui, Ben, Qingwei, Pang, Jiangmiao, Luo, Ping
The performance of legged locomotion is closely tied to the accuracy and comprehensiveness of state observations. Blind policies, which rely solely on proprioception, are considered highly robust due to the reliability of proprioceptive observations. However, these policies significantly limit locomotion speed and often require collisions with the terrain to adapt. In contrast, Vision policies allows the robot to plan motions in advance and respond proactively to unstructured terrains with an online perception module. However, perception is often compromised by noisy real-world environments, potential sensor failures, and the limitations of current simulations in presenting dynamic or deformable terrains. Humanoid robots, with high degrees of freedom and inherently unstable morphology, are particularly susceptible to misguidance from deficient perception, which can result in falls or termination on challenging dynamic terrains. To leverage the advantages of both vision and blind policies, we propose VB-Com, a composite framework that enables humanoid robots to determine when to rely on the vision policy and when to switch to the blind policy under perceptual deficiency. We demonstrate that VB-Com effectively enables humanoid robots to traverse challenging terrains and obstacles despite perception deficiencies caused by dynamic terrains or perceptual noise.
HOMIE: Humanoid Loco-Manipulation with Isomorphic Exoskeleton Cockpit
Ben, Qingwei, Jia, Feiyu, Zeng, Jia, Dong, Junting, Lin, Dahua, Pang, Jiangmiao
Current humanoid teleoperation systems either lack reliable low-level control policies, or struggle to acquire accurate whole-body control commands, making it difficult to teleoperate humanoids for loco-manipulation tasks. To solve these issues, we propose HOMIE, a novel humanoid teleoperation cockpit integrates a humanoid loco-manipulation policy and a low-cost exoskeleton-based hardware system. The policy enables humanoid robots to walk and squat to specific heights while accommodating arbitrary upper-body poses. This is achieved through our novel reinforcement learning-based training framework that incorporates upper-body pose curriculum, height-tracking reward, and symmetry utilization, without relying on any motion priors. Complementing the policy, the hardware system integrates isomorphic exoskeleton arms, a pair of motion-sensing gloves, and a pedal, allowing a single operator to achieve full control of the humanoid robot. Our experiments show our cockpit facilitates more stable, rapid, and precise humanoid loco-manipulation teleoperation, accelerating task completion and eliminating retargeting errors compared to inverse kinematics-based methods. We also validate the effectiveness of the data collected by our cockpit for imitation learning. Our project is fully open-sourced, demos and code can be found in https://homietele.github.io/.
BeamDojo: Learning Agile Humanoid Locomotion on Sparse Footholds
Wang, Huayi, Wang, Zirui, Ren, Junli, Ben, Qingwei, Huang, Tao, Zhang, Weinan, Pang, Jiangmiao
Traversing risky terrains with sparse footholds poses a significant challenge for humanoid robots, requiring precise foot placements and stable locomotion. Existing approaches designed for quadrupedal robots often fail to generalize to humanoid robots due to differences in foot geometry and unstable morphology, while learning-based approaches for humanoid locomotion still face great challenges on complex terrains due to sparse foothold reward signals and inefficient learning processes. To address these challenges, we introduce BeamDojo, a reinforcement learning (RL) framework designed for enabling agile humanoid locomotion on sparse footholds. BeamDojo begins by introducing a sampling-based foothold reward tailored for polygonal feet, along with a double critic to balancing the learning process between dense locomotion rewards and sparse foothold rewards. To encourage sufficient trail-and-error exploration, BeamDojo incorporates a two-stage RL approach: the first stage relaxes the terrain dynamics by training the humanoid on flat terrain while providing it with task terrain perceptive observations, and the second stage fine-tunes the policy on the actual task terrain. Moreover, we implement a onboard LiDAR-based elevation map to enable real-world deployment. Extensive simulation and real-world experiments demonstrate that BeamDojo achieves efficient learning in simulation and enables agile locomotion with precise foot placement on sparse footholds in the real world, maintaining a high success rate even under significant external disturbances.
Learning Humanoid Standing-up Control across Diverse Postures
Huang, Tao, Ren, Junli, Wang, Huayi, Wang, Zirui, Ben, Qingwei, Wen, Muning, Chen, Xiao, Li, Jianan, Pang, Jiangmiao
Standing-up control is crucial for humanoid robots, with the potential for integration into current locomotion and loco-manipulation systems, such as fall recovery. Existing approaches are either limited to simulations that overlook hardware constraints or rely on predefined ground-specific motion trajectories, failing to enable standing up across postures in real-world scenes. To bridge this gap, we present HoST (Humanoid Standing-up Control), a reinforcement learning framework that learns standing-up control from scratch, enabling robust sim-to-real transfer across diverse postures. HoST effectively learns posture-adaptive motions by leveraging a multi-critic architecture and curriculum-based training on diverse simulated terrains. To ensure successful real-world deployment, we constrain the motion with smoothness regularization and implicit motion speed bound to alleviate oscillatory and violent motions on physical hardware, respectively. After simulation-based training, the learned control policies are directly deployed on the Unitree G1 humanoid robot. Our experimental results demonstrate that the controllers achieve smooth, stable, and robust standing-up motions across a wide range of laboratory and outdoor environments. Videos are available at https://taohuang13.github.io/humanoid-standingup.github.io/.
GRUtopia: Dream General Robots in a City at Scale
Wang, Hanqing, Chen, Jiahe, Huang, Wensi, Ben, Qingwei, Wang, Tai, Mi, Boyu, Huang, Tao, Zhao, Siheng, Chen, Yilun, Yang, Sizhe, Cao, Peizhou, Yu, Wenye, Ye, Zichao, Li, Jialun, Long, Junfeng, Wang, Zirui, Wang, Huiling, Zhao, Ying, Tu, Zhongying, Qiao, Yu, Lin, Dahua, Pang, Jiangmiao
Recent works have been exploring the scaling laws in the field of Embodied AI. Given the prohibitive costs of collecting real-world data, we believe the Simulation-to-Real (Sim2Real) paradigm is a crucial step for scaling the learning of embodied models. This paper introduces project GRUtopia, the first simulated interactive 3D society designed for various robots. It features several advancements: (a) The scene dataset, GRScenes, includes 100k interactive, finely annotated scenes, which can be freely combined into city-scale environments. In contrast to previous works mainly focusing on home, GRScenes covers 89 diverse scene categories, bridging the gap of service-oriented environments where general robots would be initially deployed. (b) GRResidents, a Large Language Model (LLM) driven Non-Player Character (NPC) system that is responsible for social interaction, task generation, and task assignment, thus simulating social scenarios for embodied AI applications. (c) The benchmark, GRBench, supports various robots but focuses on legged robots as primary agents and poses moderately challenging tasks involving Object Loco-Navigation, Social Loco-Navigation, and Loco-Manipulation. We hope that this work can alleviate the scarcity of high-quality data in this field and provide a more comprehensive assessment of Embodied AI research. The project is available at https://github.com/OpenRobotLab/GRUtopia.
RoboDuet: A Framework Affording Mobile-Manipulation and Cross-Embodiment
Pan, Guoping, Ben, Qingwei, Yuan, Zhecheng, Jiang, Guangqi, Ji, Yandong, Pang, Jiangmiao, Liu, Houde, Xu, Huazhe
Combining the mobility of legged robots with the manipulation skills of arms has the potential to significantly expand the operational range and enhance the capabilities of robotic systems in performing various mobile manipulation tasks. Existing approaches are confined to imprecise six degrees of freedom (DoF) manipulation and possess a limited arm workspace. In this paper, we propose a novel framework, RoboDuet, which employs two collaborative policies to realize locomotion and manipulation simultaneously, achieving whole-body control through interactions between each other. Surprisingly, going beyond the large-range pose tracking, we find that the two-policy framework may enable cross-embodiment deployment such as using different quadrupedal robots or other arms. Our experiments demonstrate that the policies trained through RoboDuet can accomplish stable gaits, agile 6D end-effector pose tracking, and zero-shot exchange of legged robots, and can be deployed in the real world to perform various mobile manipulation tasks. Our project page with demo videos is at https://locomanip-duet.github.io .
DIFFTACTILE: A Physics-based Differentiable Tactile Simulator for Contact-rich Robotic Manipulation
Si, Zilin, Zhang, Gu, Ben, Qingwei, Romero, Branden, Xian, Zhou, Liu, Chao, Gan, Chuang
Our system incorporates several key components, including a Finite Element Method (FEM)-based soft body model for simulating the sensing elastomer, a multi-material simulator for modeling diverse object types (such as elastic, elastoplastic, cables) under manipulation, a penalty-based contact model for handling contact dynamics. Additionally, we introduce a method to infer the optical response of our tactile sensor to contact using an efficient pixel-based neural module. In the goal of enabling robots to perform human-level manipulation on a diverse set of tasks, touch is one of the most prominent components. Tactile sensing, as a modality, is unique in the sense that it provides accurate, fine-detailed information about environmental interactions in the form of contact geometries and forces. Although its efficacy has been highlighted by prior research, providing crucial feedback in grasping fragile objects (Ishikawa et al., 2022), enabling robots to perform in occluded environment (Yu & Rodriguez, 2018), and detecting incipient slip (Chen et al., 2018) for highly reactive grasping, there are still advances in tactile sensing to be made especially in the form of simulation. Physics-based simulation has become a significant practical tool in the domain of robotics, by mitigating the challenges of real-world design and verification of learning algorithms. This work was done during an internship at the MIT-IBM Watson AI Lab. To accurately simulate tactile sensors which are inherently soft, it is essential to model soft body interaction's contact geometries, forces, and dynamics. Prior work (Si & Yuan, 2022) attempted to simulate contact geometries and forces for tactile sensors under (quasi-)static scenarios, and it was successfully applied to robotic perception tasks such as object shape estimation (Suresh et al., 2022), and grasp stability prediction (Si et al., 2022). However, highly dynamic manipulation tasks have not been thoroughly explored. Other prior works approach contact dynamics by either approximating sensor surface deformation using rigid-body dynamics (Xu et al., 2023) or using physics-based soft-body simulation methods such as Finite Element Method (FEM) (Narang et al., 2021). However, these methods are still limited to manipulating rigid objects.