Learning Predictive Visuomotor Coordination
Jia, Wenqi, Lai, Bolin, Liu, Miao, Xu, Danfei, Rehg, James M.
–arXiv.org Artificial Intelligence
Understanding and predicting human visuomotor coordination is crucial for applications in robotics, human-computer interaction, and assistive technologies. This work introduces a forecasting-based task for visuomotor modeling, where the goal is to predict head pose, gaze, and upper-body motion from egocentric visual and kinematic observations. We propose a \textit{Visuomotor Coordination Representation} (VCR) that learns structured temporal dependencies across these multimodal signals. We extend a diffusion-based motion modeling framework that integrates egocentric vision and kinematic sequences, enabling temporally coherent and accurate visuomotor predictions. Our approach is evaluated on the large-scale EgoExo4D dataset, demonstrating strong generalization across diverse real-world activities. Our results highlight the importance of multimodal integration in understanding visuomotor coordination, contributing to research in visuomotor learning and human behavior modeling.
arXiv.org Artificial Intelligence
Mar-29-2025
- Country:
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Genre:
- Research Report > New Finding (0.66)
- Industry:
- Health & Medicine (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (0.68)
- Robots (1.00)
- Vision (0.96)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence