See, Hear, and Feel: Smart Sensory Fusion for Robotic Manipulation

Li, Hao, Zhang, Yizhi, Zhu, Junzhe, Wang, Shaoxiong, Lee, Michelle A, Xu, Huazhe, Adelson, Edward, Fei-Fei, Li, Gao, Ruohan, Wu, Jiajun

arXiv.org Artificial Intelligence 

Imagine you are savoring tea in a peaceful Zen garden: a robot sees your empty cup and starts pouring, hears the increase of the sound pitch as the water level rises in the cup, and feels with its fingers around the handle of the teapot to tell how much tea is left and control the pouring speed. For both humans and robots, multisensory perception with vision, audio, and touch plays a crucial role in everyday tasks: vision reliably captures the global setup, audio sends immediate alerts even for occluded events, and touch provides precise local geometry of objects that reveal their status. Though exciting progress has been made on teaching robots to tackle various tasks [1, 2, 3, 4, 5], limited prior work has combined multiple sensory modalities for robot learning. There have been some recent attempts that use audio [6, 7, 8, 9] or touch [10, 11, 12, 13, 14] in conjunction with vision for robot perception, but no prior work has simultaneously incorporated visual, acoustic, and tactile signals--three principal sensory modalities, and study their respective roles on challenging multisensory robotic manipulation tasks. We aim to demonstrate the benefit of fusing multiple sensory modalities for solving complex robotic manipulation tasks, and to provide an in-depth study of the characteristics of each modality and how they complement each other.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found