Real-time Recognition of Human Interactions from a Single RGB-D Camera for Socially-Aware Robot Navigation

Nguyen, Thanh Long, Nguyen, Duc Phu, Nu, Thanh Thao Ton, Le, Quan, Tran, Thuan Hoang, Phung, Manh Duong

arXiv.org Artificial Intelligence 

Social robots play a key role in many applications such as elderly care, home assistant, customer service, and education where they assist, interact, and communicate with humans in a socially intelligent manner. These robots must ensure not only physical safety but also psychological comfort for humans by following social norms. For instance, a robot should avoid disrupting a group conversation when navigating a crowded space as this could be seen as impolite or intrusive. To accomplish this, the robot must not only detect humans but also recognize and interpret their interactions such as conversations, discussions, gatherings, and collaborative activities to adapt its movements accordingly. According to [1, 2], human group interactions are structured into three distinct spaces: (i) o-space, the central region where active participants focus their attention, (ii) p-space, the surrounding area occupied by engaged individuals, and (iii) r-space, the outer region where bystanders or non-participants are positioned. To enable socially aware navigation, recognition algorithms must estimate these spatial regions.