Hand-Object Interaction Pretraining from Videos
Singh, Himanshu Gaurav, Loquercio, Antonio, Sferrazza, Carmelo, Wu, Jane, Qi, Haozhi, Abbeel, Pieter, Malik, Jitendra
–arXiv.org Artificial Intelligence
Reusable sensorimotor representations have the potential to give robots access to the versatility of their sensorimotor apparatus, thereby enabling them to achieve a wide variety of goals. Similar to advancements in other AI domains [1, 2], such representations are likely to be trained with unsupervised objectives on large datasets. In this work, we study the feasibility of training such representations using human videos in the context of dexterous manipulation. Using videos as a data engine comes with several advantages: (1) they are abundant; (2) they cover a wide range of skills that we want robots to master; and (3) they reflect natural or socially acceptable behaviors that we want robots to emulate. However, training sensorimotor representations on videos is a challenging endeavor.
arXiv.org Artificial Intelligence
Sep-12-2024
- Country:
- Europe > United Kingdom
- England > Oxfordshire > Oxford (0.04)
- Asia > Japan
- Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Europe > United Kingdom
- Genre:
- Research Report (0.64)
- Technology: