contact mask
KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills
Xie, Weiji, Han, Jinrui, Zheng, Jiakun, Li, Huanyu, Liu, Xinzhe, Shi, Jiyuan, Zhang, Weinan, Bai, Chenjia, Li, Xuelong
Humanoid robots are promising to acquire various skills by imitating human behaviors. However, existing algorithms are only capable of tracking smooth, low-speed human motions, even with delicate reward and curriculum design. This paper presents a physics-based humanoid control framework, aiming to master highly-dynamic human behaviors such as Kungfu and dancing through multi-steps motion processing and adaptive motion tracking. For motion processing, we design a pipeline to extract, filter out, correct, and retarget motions, while ensuring compliance with physical constraints to the maximum extent. For motion imitation, we formulate a bi-level optimization problem to dynamically adjust the tracking accuracy tolerance based on the current tracking error, creating an adaptive curriculum mechanism. We further construct an asymmetric actor-critic framework for policy training. In experiments, we train whole-body control policies to imitate a set of highly-dynamic motions. Our method achieves significantly lower tracking errors than existing approaches and is successfully deployed on the Unitree G1 robot, demonstrating stable and expressive behaviors. The project page is https://kungfu-bot.github.io.
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.92)
- Information Technology (0.67)
- Education (0.66)
Visual-auditory Extrinsic Contact Estimation
Yi, Xili, Lee, Jayjun, Fazeli, Nima
Estimating contact locations between a grasped object and the environment is important for robust manipulation. In this paper, we present a visual-auditory method for extrinsic contact estimation, featuring a real-to-sim approach for auditory signals. Our method equips a robotic manipulator with contact microphones and speakers on its fingers, along with an externally mounted static camera providing a visual feed of the scene. As the robot manipulates objects, it detects contact events with surrounding surfaces using auditory feedback from the fingertips and visual feedback from the camera. A key feature of our approach is the transfer of auditory feedback into a simulated environment, where we learn a multimodal representation that is then applied to real world scenes without additional training. This zero-shot transfer is accurate and robust in estimating contact location and size, as demonstrated in our simulated and real world experiments in various cluttered environments.
- North America > United States > Michigan (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
H-FCBFormer Hierarchical Fully Convolutional Branch Transformer for Occlusal Contact Segmentation with Articulating Paper
Banks, Ryan, Rovira-Lastra, Bernat, Martinez-Gomis, Jordi, Chaurasia, Akhilanand, Li, Yunpeng
Occlusal contacts are the locations at which the occluding surfaces of the maxilla and the mandible posterior teeth meet. Occlusal contact detection is a vital tool for restoring the loss of masticatory function and is a mandatory assessment in the field of dentistry, with particular importance in prosthodontics and restorative dentistry. The most common method for occlusal contact detection is articulating paper. However, this method can indicate significant medically false positive and medically false negative contact areas, leaving the identification of true occlusal indications to clinicians. To address this, we propose a multiclass Vision Transformer and Fully Convolutional Network ensemble semantic segmentation model with a combination hierarchical loss function, which we name as Hierarchical Fully Convolutional Branch Transformer (H-FCBFormer). We also propose a method of generating medically true positive semantic segmentation masks derived from expert annotated articulating paper masks and gold standard masks. The proposed model outperforms other machine learning methods evaluated at detecting medically true positive contacts and performs better than dentists in terms of accurately identifying object-wise occlusal contact areas while taking significantly less time to identify them.
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > United Kingdom > England > Surrey > Guildford (0.04)
- Asia > India > Uttar Pradesh > Lucknow (0.04)
- Health & Medicine > Therapeutic Area > Dental and Oral Health (0.79)
- Health & Medicine > Diagnostic Medicine (0.69)
UV-Based 3D Hand-Object Reconstruction with Grasp Optimization
Yu, Ziwei, Yang, Linlin, Xie, You, Chen, Ping, Yao, Angela
This set-up lends itself well to AR/VR settings where the hand interacts with a predefined object, perhaps with markers to facilitate the object pose estimation. Such a setting is common, although the majority of previous works [31, 33, 34] consider 3D point clouds as input, while we handle the more difficult case of monocular RGB inputs. Additionally, the previous works [12, 13, 31, 33, 34, 60] are singularly focused on reconstructing feasible hand-object interactions. They aim to produce hand meshes with minimal penetration to the 3D object without regard for the accuracy of the 3D hand pose. We take on the additional challenge of balancing realistic hand-object interactions with accurate 3D hand poses. Representation-wise, previous hand-object 3D reconstruction works [2, 35, 61, 72] predominantly with the MANO model [55].