Learning Predictive Visuomotor Coordination

Jia, Wenqi, Lai, Bolin, Liu, Miao, Xu, Danfei, Rehg, James M.

Mar-29-2025–arXiv.org Artificial Intelligence

Understanding and predicting human visuomotor coordination is crucial for applications in robotics, human-computer interaction, and assistive technologies. This work introduces a forecasting-based task for visuomotor modeling, where the goal is to predict head pose, gaze, and upper-body motion from egocentric visual and kinematic observations. We propose a \textit{Visuomotor Coordination Representation} (VCR) that learns structured temporal dependencies across these multimodal signals. We extend a diffusion-based motion modeling framework that integrates egocentric vision and kinematic sequences, enabling temporally coherent and accurate visuomotor predictions. Our approach is evaluated on the large-scale EgoExo4D dataset, demonstrating strong generalization across diverse real-world activities. Our results highlight the importance of multimodal integration in understanding visuomotor coordination, contributing to research in visuomotor learning and human behavior modeling.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

Mar-29-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Vision (0.96)
  - Machine Learning > Neural Networks
    - Deep Learning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found