Toward Human Deictic Gesture Target Estimation

Neural Information Processing Systems 

Humans have a remarkable ability to use co-speech deictic gestures, such as pointing and showing, to enrich verbal communication and support social interaction. These gestures are so fundamental that infants begin to use them even before they acquire spoken language, which highlights their central role in human communication. Understanding the intended targets of another individual's deictic gestures enables inference of their intentions, comprehension of their current actions, and prediction of upcoming behaviors. Despite its significance, gesture target estimation remains an underexplored task within the computer vision community. In this paper, we introduce GestureTarget, a novel task designed specifically for comprehensive evaluation of social deictic gesture semantic target estimation.