MIRAGE: Multimodal Intention Recognition and Admittance-Guided Enhancement in VR-based Multi-object Teleoperation
Sun, Chi, Wang, Xian, Kumar, Abhishek, Cui, Chengbin, Lee, Lik-Hang
–arXiv.org Artificial Intelligence
This is the author's version of the article. T o appear in an IEEE ISMAR conference. The Hong Kong Polytechnic UniversityFigure 1: A pictorial description of the MIRAGE framework that enhances HRI tele-grasping capability for multiple objects in VR. MIRAGE divides the multi-object grasping task into two phases: movement (manual) and grasping (semi-automatic). Each phase has a specific assistance method designed in MIRAGE: In the movement (manual) phase, Virtual Admittance (VA) modifies the robot trajectory (b), comparing to the non-VA condition (a), is easier to motivate the robot to approach target through the same hand movement; in the grasping (semi-automatic) phase, a Multimodal-CNN-based Human Intention Perception Network (MMIPN) is proposed to estimate the human desired grasp position for robot grasp motion plan (d), and the non-MMIPN condition plans the grasping motion as a vertical downward path (c). Effective human-robot interaction (HRI) in multi-object teleoper-ation tasks faces significant challenges due to perceptual ambiguities in virtual reality (VR) environments and the limitations of single-modality intention recognition. This paper proposes a shared control framework that combines a virtual admittance (V A) model with a Multimodal-CNN-based Human Intention Perception Network (MMIPN) to enhance teleoperation performance and user experience. The V A model employs artificial potential fields to guide operators toward target objects by adjusting admittance force and optimizing motion trajectories. MMIPN processes multi-modal inputs--gaze movement, robot motions, and environmental context--to estimate human grasping intentions, helping overcome depth perception challenges in VR. Gaze data emerged as the most crucial input modality. These findings demonstrate the effectiveness of combining multimodal cues with implicit guidance in VR-based teleoperation, providing a robust solution for multi-object grasping tasks and enabling more natural interactions across various applications in the future. With the rapid development of robotics and metaverse technology, in particular, teleoperation technology has brought diverse modes and expanded opportunities for remote operations. In the fields of aerospace manipulator operation [28, 45], extraterrestrial ground exploration [8], nuclear environment maintenance [46, 15], remote medical surgery [62, 12], and life care assistance [44], teleoperation already has a wide range of technical needs and successful application experience. The rise and prosperity of Metaverse technology have promoted the applications of virtual reality (VR) in industrial teleoperation [67, 9, 48]. The immersion of VR can provide a more realistic experience for the teleoperation.
arXiv.org Artificial Intelligence
Sep-3-2025
- Country:
- Asia > China
- Hong Kong (0.60)
- Europe > Spain (0.04)
- North America > United States
- Missouri > Jackson County > Kansas City (0.14)
- Asia > China
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Health & Medicine > Health Care Technology (0.66)
- Technology: